Table of Content
I.	Project Summary
II.	Workflow Checklist
III.	NGS Sequencing Results
IV.	Complete Report Download
V.	Raw Sequence Data Download
VI.	Analysis - DADA2 Read Processing
	Sample Meta Info
	Read Count by Sample
VII.	Analysis - Read Taxonomy Assignment
	Taxonomy Barplots
VIII.	Analysis - Alpha Diversity
IX.	Analysis - Beta Diversity
X.	Analysis - Differential Abundance
	ANCOM Result
	LEfSe Result
XI.	Analysis - Heatmap Profile
XII.	Analysis - Network Association
XIII.	Disclaimer

(Click to navigate)

FOMC Service Report

16S rRNA Gene V1V3 Amplicon Sequencing

Version V1.52

Version History

The Forsyth Institute, Cambridge, MA, USA

January 23, 2026

Project ID: FOMC20260106

I. Project Summary

Project FOMC20260106 services include NGS sequencing of the V1V3 region of the 16S rRNA gene amplicons from the samples. First and foremost, please download this report, as well as the sequence raw data from the download links provided below. These links will expire after 60 days. We cannot guarantee the availability of your data after 60 days.

Full Bioinformatics analysis service was requested. We provide many analyses, starting from the raw sequence quality and noise filtering, pair reads merging, as well as chimera filtering for the sequences, using the DADA2 denosing algorithm and pipeline.

We also provide many downstream analyses such as taxonomy assignment, alpha and beta diversity analyses, and differential abundance analysis.

For taxonomy assignment, most informative would be the taxonomy barplots. We provide an interactive barplots to show the relative abundance of microbes at different taxonomy levels (from Phylum to species) that you can choose.

If you specify which groups of samples you want to compare for differential abundance, we provide both ANCOM and LEfSe differential abundance analysis.

II. Workflow Checklist

☑	1.	Sample Received
☑	2.	Sample Quality Evaluated
☑	3.	Sample Prepared for Sequencing
☑	4.	Next-Gen Sequencing
☑	5.	Sequence Quality Check
☑	6.	Absolute Abundance
☑	7.	Report and Raw Sequence Data Available for Download
☑	8.	Bioinformatics Analysis - Reads Processing (DADA2 Quality Trimming, Denoising, Paired Reads Merging)
☑	9.	Bioinformatics Analysis - Reads Taxonomy Assignment
☑	10.	Bioinformatics Analysis - Alpha Diversity Analysis
☑	11.	Bioinformatics Analysis - Beta Diversity Analysis
☑	12.	Bioinformatics Analysis - Differential Abundance Analysis
☑	13.	Bioinformatics Analysis - Heatmap Profile
☑	14.	Bioinformatics Analysis - Network Association

III. NGS Sequencing

The samples were processed and analyzed with the ZymoBIOMICS® Service: Targeted Metagenomic Sequencing (Zymo Research, Irvine, CA).

DNA Extraction: If DNA extraction was performed, the following DNA extraction kit was used according to the manufacturer’s instructions:

☑	ZymoBIOMICS®-96 MagBead DNA Kit (Zymo Research, Irvine, CA)
☐	N/A (DNA Extraction Not Performed)
Elution Volume: 50µL
Additional Notes: NA

Targeted Library Preparation: The DNA samples were prepared for targeted sequencing with the Quick-16S™ NGS Library Prep Kit (Zymo Research, Irvine, CA). These primers were custom designed by Zymo Research to provide the best coverage of the 16S gene while maintaining high sensitivity. The primer sets used in this project are marked below:

☐	Quick-16S™ Primer Set V1-V2 (Zymo Research, Irvine, CA)
☑	Quick-16S™ Primer Set V1-V3 (Zymo Research, Irvine, CA)
☐	Quick-16S™ Primer Set V3-V4 (Zymo Research, Irvine, CA)
☐	Quick-16S™ Primer Set V4 (Zymo Research, Irvine, CA)
☐	Quick-16S™ Primer Set V6-V8 (Zymo Research, Irvine, CA)
Additional Notes: NA

The sequencing library was prepared using an innovative library preparation process in which PCR reactions were performed in real-time PCR machines to control cycles and therefore limit PCR chimera formation. The final PCR products were quantified with qPCR fluorescence readings and pooled together based on equal molarity. The final pooled library was cleaned up with the Select-a-Size DNA Clean & Concentrator™ (Zymo Research, Irvine, CA), then quantified with TapeStation® (Agilent Technologies, Santa Clara, CA) and Qubit® (Thermo Fisher Scientific, Waltham, WA).

Control Samples: The ZymoBIOMICS® Microbial Community Standard (Zymo Research, Irvine, CA) was used as a positive control for each DNA extraction, if performed. The ZymoBIOMICS® Microbial Community DNA Standard (Zymo Research, Irvine, CA) was used as a positive control for each targeted library preparation. Negative controls (i.e. blank extraction control, blank library preparation control) were included to assess the level of bioburden carried by the wet-lab process.

Sequencing: The final library was sequenced on Illumina® NextSeq 2000™ with a p1 (Illumina, Sand Diego, CA) reagent kit (600 cycles). The sequencing was performed with 25% PhiX spike-in.

Absolute Abundance Quantification*: A quantitative real-time PCR was set up with a standard curve. The standard curve was made with plasmid DNA containing one copy of the 16S gene and one copy of the fungal ITS2 region prepared in 10-fold serial dilutions. The primers used were the same as those used in Targeted Library Preparation. The equation generated by the plasmid DNA standard curve was used to calculate the number of gene copies in the reaction for each sample. The PCR input volume (2 µl) was used to calculate the number of gene copies per microliter in each DNA sample.
The number of genome copies per microliter DNA sample was calculated by dividing the gene copy number by an assumed number of gene copies per genome. The value used for 16S copies per genome is 4. The value used for ITS copies per genome is 200. The amount of DNA per microliter DNA sample was calculated using an assumed genome size of 4.64 x 10⁶ bp, the genome size of Escherichia coli, for 16S samples, or an assumed genome size of 1.20 x 10⁷ bp, the genome size of Saccharomyces cerevisiae, for ITS samples. This calculation is shown below:

Calculated Total DNA = Calculated Total Genome Copies × Assumed Genome Size (4.64 × 10⁶ bp) ×
Average Molecular Weight of a DNA bp (660 g/mole/bp) ÷ Avogadro’s Number (6.022 x 10²³/mole)

* Absolute Abundance Quantification is only available for 16S and ITS analyses.

The absolute abundance standard curve data can be viewed in Excel here:

The absolute abundance standard curve is shown below:

Absolute Abundance Standard Curve

IV. Complete Report Download

The complete report of your project, including all links in this report, can be downloaded by clicking the link provided below. The downloaded file is a compressed ZIP file and once unzipped, open the file “REPORT.html” (may only shown as "REPORT" in your computer) by double clicking it. Your default web browser will open it and you will see the exact content of this report.

Please download and save the file to your computer storage device. The download link will expire after 60 days upon your receiving of this report.

Complete report download link:

To view the report, please follow the following steps:

1.	Download the .zip file from the report link above.
2.	Extract all the contents of the downloaded .zip file to your desktop.
3.	Open the extracted folder and find the "REPORT.html" (may shown as only "REPORT").
4.	Open (double-clicking) the REPORT.html file. Your default browser will open the top age of the complete report. Within the report, there are links to view all the analyses performed for the project.

V. Raw Sequence Data Download

The raw NGS sequence data is available for download with the link provided below. The data is a compressed ZIP file and can be unzipped to individual sequence files. Since this is a Pac-Bio full-length (V1V9) 16S rRNA amplicon sequencing, raw sequences are available for download in a single compressed zip file in the download link below. After unzipping, you will find individual sequence files for each of your samples with the file extension “*.fastq.gz”. The files are in FASTQ format and are compressed. FASTQ format is a text-based data format for storing both a biological sequence and its corresponding quality scores. Most sequence analysis software will be able to open them. The Sample IDs associated with the fastq files are listed in the table below:


Sample ID Original Sample ID Read 1 File Name Read 2 File Name
F20260106.S10 original sample ID here zr20260106_10V1V3_R1.fastq.gz zr20260106_10V1V3_R2.fastq.gz
F20260106.S11 original sample ID here zr20260106_11V1V3_R1.fastq.gz zr20260106_11V1V3_R2.fastq.gz
F20260106.S12 original sample ID here zr20260106_12V1V3_R1.fastq.gz zr20260106_12V1V3_R2.fastq.gz
F20260106.S13 original sample ID here zr20260106_13V1V3_R1.fastq.gz zr20260106_13V1V3_R2.fastq.gz
F20260106.S14 original sample ID here zr20260106_14V1V3_R1.fastq.gz zr20260106_14V1V3_R2.fastq.gz
F20260106.S15 original sample ID here zr20260106_15V1V3_R1.fastq.gz zr20260106_15V1V3_R2.fastq.gz
F20260106.S16 original sample ID here zr20260106_16V1V3_R1.fastq.gz zr20260106_16V1V3_R2.fastq.gz
F20260106.S17 original sample ID here zr20260106_17V1V3_R1.fastq.gz zr20260106_17V1V3_R2.fastq.gz
F20260106.S18 original sample ID here zr20260106_18V1V3_R1.fastq.gz zr20260106_18V1V3_R2.fastq.gz
F20260106.S19 original sample ID here zr20260106_19V1V3_R1.fastq.gz zr20260106_19V1V3_R2.fastq.gz
F20260106.S01 original sample ID here zr20260106_1V1V3_R1.fastq.gz zr20260106_1V1V3_R2.fastq.gz
F20260106.S20 original sample ID here zr20260106_20V1V3_R1.fastq.gz zr20260106_20V1V3_R2.fastq.gz
F20260106.S21 original sample ID here zr20260106_21V1V3_R1.fastq.gz zr20260106_21V1V3_R2.fastq.gz
F20260106.S22 original sample ID here zr20260106_22V1V3_R1.fastq.gz zr20260106_22V1V3_R2.fastq.gz
F20260106.S23 original sample ID here zr20260106_23V1V3_R1.fastq.gz zr20260106_23V1V3_R2.fastq.gz
F20260106.S24 original sample ID here zr20260106_24V1V3_R1.fastq.gz zr20260106_24V1V3_R2.fastq.gz
F20260106.S25 original sample ID here zr20260106_25V1V3_R1.fastq.gz zr20260106_25V1V3_R2.fastq.gz
F20260106.S26 original sample ID here zr20260106_26V1V3_R1.fastq.gz zr20260106_26V1V3_R2.fastq.gz
F20260106.S27 original sample ID here zr20260106_27V1V3_R1.fastq.gz zr20260106_27V1V3_R2.fastq.gz
F20260106.S28 original sample ID here zr20260106_28V1V3_R1.fastq.gz zr20260106_28V1V3_R2.fastq.gz
F20260106.S29 original sample ID here zr20260106_29V1V3_R1.fastq.gz zr20260106_29V1V3_R2.fastq.gz
F20260106.S02 original sample ID here zr20260106_2V1V3_R1.fastq.gz zr20260106_2V1V3_R2.fastq.gz
F20260106.S30 original sample ID here zr20260106_30V1V3_R1.fastq.gz zr20260106_30V1V3_R2.fastq.gz
F20260106.S31 original sample ID here zr20260106_31V1V3_R1.fastq.gz zr20260106_31V1V3_R2.fastq.gz
F20260106.S32 original sample ID here zr20260106_32V1V3_R1.fastq.gz zr20260106_32V1V3_R2.fastq.gz
F20260106.S33 original sample ID here zr20260106_33V1V3_R1.fastq.gz zr20260106_33V1V3_R2.fastq.gz
F20260106.S34 original sample ID here zr20260106_34V1V3_R1.fastq.gz zr20260106_34V1V3_R2.fastq.gz
F20260106.S35 original sample ID here zr20260106_35V1V3_R1.fastq.gz zr20260106_35V1V3_R2.fastq.gz
F20260106.S36 original sample ID here zr20260106_36V1V3_R1.fastq.gz zr20260106_36V1V3_R2.fastq.gz
F20260106.S37 original sample ID here zr20260106_37V1V3_R1.fastq.gz zr20260106_37V1V3_R2.fastq.gz
F20260106.S38 original sample ID here zr20260106_38V1V3_R1.fastq.gz zr20260106_38V1V3_R2.fastq.gz
F20260106.S39 original sample ID here zr20260106_39V1V3_R1.fastq.gz zr20260106_39V1V3_R2.fastq.gz
F20260106.S03 original sample ID here zr20260106_3V1V3_R1.fastq.gz zr20260106_3V1V3_R2.fastq.gz
F20260106.S40 original sample ID here zr20260106_40V1V3_R1.fastq.gz zr20260106_40V1V3_R2.fastq.gz
F20260106.S41 original sample ID here zr20260106_41V1V3_R1.fastq.gz zr20260106_41V1V3_R2.fastq.gz
F20260106.S42 original sample ID here zr20260106_42V1V3_R1.fastq.gz zr20260106_42V1V3_R2.fastq.gz
F20260106.S43 original sample ID here zr20260106_43V1V3_R1.fastq.gz zr20260106_43V1V3_R2.fastq.gz
F20260106.S44 original sample ID here zr20260106_44V1V3_R1.fastq.gz zr20260106_44V1V3_R2.fastq.gz
F20260106.S45 original sample ID here zr20260106_45V1V3_R1.fastq.gz zr20260106_45V1V3_R2.fastq.gz
F20260106.S46 original sample ID here zr20260106_46V1V3_R1.fastq.gz zr20260106_46V1V3_R2.fastq.gz
F20260106.S47 original sample ID here zr20260106_47V1V3_R1.fastq.gz zr20260106_47V1V3_R2.fastq.gz
F20260106.S48 original sample ID here zr20260106_48V1V3_R1.fastq.gz zr20260106_48V1V3_R2.fastq.gz
F20260106.S49 original sample ID here zr20260106_49V1V3_R1.fastq.gz zr20260106_49V1V3_R2.fastq.gz
F20260106.S04 original sample ID here zr20260106_4V1V3_R1.fastq.gz zr20260106_4V1V3_R2.fastq.gz
F20260106.S50 original sample ID here zr20260106_50V1V3_R1.fastq.gz zr20260106_50V1V3_R2.fastq.gz
F20260106.S51 original sample ID here zr20260106_51V1V3_R1.fastq.gz zr20260106_51V1V3_R2.fastq.gz
F20260106.S52 original sample ID here zr20260106_52V1V3_R1.fastq.gz zr20260106_52V1V3_R2.fastq.gz
F20260106.S53 original sample ID here zr20260106_53V1V3_R1.fastq.gz zr20260106_53V1V3_R2.fastq.gz
F20260106.S54 original sample ID here zr20260106_54V1V3_R1.fastq.gz zr20260106_54V1V3_R2.fastq.gz
F20260106.S55 original sample ID here zr20260106_55V1V3_R1.fastq.gz zr20260106_55V1V3_R2.fastq.gz
F20260106.S56 original sample ID here zr20260106_56V1V3_R1.fastq.gz zr20260106_56V1V3_R2.fastq.gz
F20260106.S57 original sample ID here zr20260106_57V1V3_R1.fastq.gz zr20260106_57V1V3_R2.fastq.gz
F20260106.S58 original sample ID here zr20260106_58V1V3_R1.fastq.gz zr20260106_58V1V3_R2.fastq.gz
F20260106.S59 original sample ID here zr20260106_59V1V3_R1.fastq.gz zr20260106_59V1V3_R2.fastq.gz
F20260106.S05 original sample ID here zr20260106_5V1V3_R1.fastq.gz zr20260106_5V1V3_R2.fastq.gz
F20260106.S60 original sample ID here zr20260106_60V1V3_R1.fastq.gz zr20260106_60V1V3_R2.fastq.gz
F20260106.S61 original sample ID here zr20260106_61V1V3_R1.fastq.gz zr20260106_61V1V3_R2.fastq.gz
F20260106.S62 original sample ID here zr20260106_62V1V3_R1.fastq.gz zr20260106_62V1V3_R2.fastq.gz
F20260106.S63 original sample ID here zr20260106_63V1V3_R1.fastq.gz zr20260106_63V1V3_R2.fastq.gz
F20260106.S64 original sample ID here zr20260106_64V1V3_R1.fastq.gz zr20260106_64V1V3_R2.fastq.gz
F20260106.S65 original sample ID here zr20260106_65V1V3_R1.fastq.gz zr20260106_65V1V3_R2.fastq.gz
F20260106.S66 original sample ID here zr20260106_66V1V3_R1.fastq.gz zr20260106_66V1V3_R2.fastq.gz
F20260106.S67 original sample ID here zr20260106_67V1V3_R1.fastq.gz zr20260106_67V1V3_R2.fastq.gz
F20260106.S68 original sample ID here zr20260106_68V1V3_R1.fastq.gz zr20260106_68V1V3_R2.fastq.gz
F20260106.S69 original sample ID here zr20260106_69V1V3_R1.fastq.gz zr20260106_69V1V3_R2.fastq.gz
F20260106.S06 original sample ID here zr20260106_6V1V3_R1.fastq.gz zr20260106_6V1V3_R2.fastq.gz
F20260106.S70 original sample ID here zr20260106_70V1V3_R1.fastq.gz zr20260106_70V1V3_R2.fastq.gz
F20260106.S71 original sample ID here zr20260106_71V1V3_R1.fastq.gz zr20260106_71V1V3_R2.fastq.gz
F20260106.S72 original sample ID here zr20260106_72V1V3_R1.fastq.gz zr20260106_72V1V3_R2.fastq.gz
F20260106.S73 original sample ID here zr20260106_73V1V3_R1.fastq.gz zr20260106_73V1V3_R2.fastq.gz
F20260106.S74 original sample ID here zr20260106_74V1V3_R1.fastq.gz zr20260106_74V1V3_R2.fastq.gz
F20260106.S75 original sample ID here zr20260106_75V1V3_R1.fastq.gz zr20260106_75V1V3_R2.fastq.gz
F20260106.S76 original sample ID here zr20260106_76V1V3_R1.fastq.gz zr20260106_76V1V3_R2.fastq.gz
F20260106.S77 original sample ID here zr20260106_77V1V3_R1.fastq.gz zr20260106_77V1V3_R2.fastq.gz
F20260106.S78 original sample ID here zr20260106_78V1V3_R1.fastq.gz zr20260106_78V1V3_R2.fastq.gz
F20260106.S79 original sample ID here zr20260106_79V1V3_R1.fastq.gz zr20260106_79V1V3_R2.fastq.gz
F20260106.S07 original sample ID here zr20260106_7V1V3_R1.fastq.gz zr20260106_7V1V3_R2.fastq.gz
F20260106.S80 original sample ID here zr20260106_80V1V3_R1.fastq.gz zr20260106_80V1V3_R2.fastq.gz
F20260106.S81 original sample ID here zr20260106_81V1V3_R1.fastq.gz zr20260106_81V1V3_R2.fastq.gz
F20260106.S82 original sample ID here zr20260106_82V1V3_R1.fastq.gz zr20260106_82V1V3_R2.fastq.gz
F20260106.S83 original sample ID here zr20260106_83V1V3_R1.fastq.gz zr20260106_83V1V3_R2.fastq.gz
F20260106.S84 original sample ID here zr20260106_84V1V3_R1.fastq.gz zr20260106_84V1V3_R2.fastq.gz
F20260106.S08 original sample ID here zr20260106_8V1V3_R1.fastq.gz zr20260106_8V1V3_R2.fastq.gz
F20260106.S09 original sample ID here zr20260106_9V1V3_R1.fastq.gz zr20260106_9V1V3_R2.fastq.gz

Please download and save the file to your computer storage device. The download link will expire after 60 days upon your receiving of this report.

Raw sequence data download link:

VI. Analysis - DADA2 Read Processing

What is DADA2?

DADA2 is a software package that models and corrects Illumina-sequenced amplicon errors [1]. DADA2 infers sample sequences exactly, without coarse-graining into OTUs, and resolves differences of as little as one nucleotide. DADA2 identified more real variants and output fewer spurious sequences than other methods.

DADA2’s advantage is that it uses more of the data. The DADA2 error model incorporates quality information, which is ignored by all other methods after filtering. The DADA2 error model incorporates quantitative abundances, whereas most other methods use abundance ranks if they use abundance at all. The DADA2 error model identifies the differences between sequences, eg. A->C, whereas other methods merely count the mismatches. DADA2 can parameterize its error model from the data itself, rather than relying on previous datasets that may or may not reflect the PCR and sequencing protocols used in your study.

DADA2 Software Package is available as an R package at : https://benjjneb.github.io/dada2/index.html

References

Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods. 2016 Jul;13(7):581-3. doi: 10.1038/nmeth.3869. Epub 2016 May 23. PMID: 27214047; PMCID: PMC4927377.

Analysis Procedures:

DADA2 pipeline includes several tools for read quality control, including quality filtering, trimming, denoising, pair merging and chimera filtering. Below are the major processing steps of DADA2:

Step 1. Read trimming based on sequence quality The quality of NGS Illumina sequences often decreases toward the end of the reads. DADA2 allows to trim off the poor quality read ends in order to improve the error model building and pair mergicing performance.

Step 2. Learn the Error Rates The DADA2 algorithm makes use of a parametric error model (err) and every amplicon dataset has a different set of error rates. The learnErrors method learns this error model from the data, by alternating estimation of the error rates and inference of sample composition until they converge on a jointly consistent solution. As in many machine-learning problems, the algorithm must begin with an initial guess, for which the maximum possible error rates in this data are used (the error rates if only the most abundant sequence is correct and all the rest are errors).

Step 3. Infer amplicon sequence variants (ASVs) based on the error model built in previous step. This step is also called sequence "denoising". The outcome of this step is a list of ASVs that are the equivalent of oligonucleotides.

Step 4. Merge paired reads. If the sequencing products are read pairs, DADA2 will merge the R1 and R2 ASVs into single sequences. Merging is performed by aligning the denoised forward reads with the reverse-complement of the corresponding denoised reverse reads, and then constructing the merged “contig” sequences. By default, merged sequences are only output if the forward and reverse reads overlap by at least 12 bases, and are identical to each other in the overlap region (but these conditions can be changed via function arguments).

Step 5. Remove chimera. The core dada method corrects substitution and indel errors, but chimeras remain. Fortunately, the accuracy of sequence variants after denoising makes identifying chimeric ASVs simpler than when dealing with fuzzy OTUs. Chimeric sequences are identified if they can be exactly reconstructed by combining a left-segment and a right-segment from two more abundant “parent” sequences. The frequency of chimeric sequences varies substantially from dataset to dataset, and depends on on factors including experimental procedures and sample complexity.

Results

1. Read Quality Plots NGS sequence analaysis starts with visualizing the quality of the sequencing. Below are the quality plots of the first sample for the R1 and R2 reads separately. In gray-scale is a heat map of the frequency of each quality score at each base position. The mean quality score at each position is shown by the green line, and the quartiles of the quality score distribution by the orange lines. The forward reads are usually of better quality. It is a common practice to trim the last few nucleotides to avoid less well-controlled errors that can arise there. The trimming affects the downstream steps including error model building, merging and chimera calling. FOMC uses an empirical approach to test many combinations of different trim length in order to achieve best final amplicon sequence variants (ASVs), see the next section “Optimal trim length for ASVs”.

Quality plots for all samples:

quality_plots_1-20.pdf

quality_plots_21-40.pdf

quality_plots_41-60.pdf

quality_plots_61-80.pdf

quality_plots_81-84.pdf

2. Optimal trim length for ASVs The final number of merged and chimera-filtered ASVs depends on the quality filtering (hence trimming) in the very beginning of the DADA2 pipeline. In order to achieve highest number of ASVs, an empirical approach was used -

Create a random subset of each sample consisting of 5,000 R1 and 5,000 R2 (to reduce computation time)
Trim 10 bases at a time from the ends of both R1 and R2 up to 50 bases
For each combination of trimmed length (e.g., 300x300, 300x290, 290x290 etc), the trimmed reads are subject to the entire DADA2 pipeline for chimera-filtered merged ASVs
The combination with highest percentage of the input reads becoming final ASVs is selected for the complete set of data

Below is the result of such operation, showing ASV percentages of total reads for all trimming combinations (1st Column = R1 lengths in bases; 1st Row = R2 lengths in bases):

R1/R2	251	241	231	221	211	201
251	64.70%	65.13%	64.86%	64.41%	63.20%	63.26%
241	64.48%	65.07%	64.81%	64.58%	63.18%	63.15%
231	64.73%	65.17%	65.06%	64.88%	64.16%	64.03%
221	64.56%	65.15%	65.26%	64.68%	64.52%	64.49%
211	64.94%	65.66%	65.91%	65.41%	65.20%	65.05%
201	65.32%	66.20%	66.52%	66.29%	65.68%	65.53%

Based on the above result, the trim length combination of R1 = 201 bases and R2 = 231 bases (highlighted red above), was chosen for generating final ASVs for all sequences. This combination generated highest number of merged non-chimeric ASVs and was used for downstream analyses, if requested.

3. Error plots from learning the error rates After DADA2 building the error model for the set of data, it is always worthwhile, as a sanity check if nothing else, to visualize the estimated error rates. The error rates for each possible transition (A→C, A→G, …) are shown below. Points are the observed error rates for each consensus quality score. The black line shows the estimated error rates after convergence of the machine-learning algorithm. The red line shows the error rates expected under the nominal definition of the Q-score. The ideal result would be the estimated error rates (black line) are a good fit to the observed rates (points), and the error rates drop with increased quality as expected.

Forward Read R1 Error Plot

Reverse Read R2 Error Plot

The PDF version of these plots are available here:

4. DADA2 Result Summary The table below shows the summary of the DADA2 analysis, tracking paired read counts of each samples for all the steps during DADA2 denoising process - including end-trimming (filtered), denoising (denoisedF, denoisedF), pair merging (merged) and chimera removal (nonchim).

Sample ID	F20260106.S01	F20260106.S02	F20260106.S03	F20260106.S04	F20260106.S05	F20260106.S06	F20260106.S07	F20260106.S08	F20260106.S09	F20260106.S10	F20260106.S11	F20260106.S12	F20260106.S13	F20260106.S14	F20260106.S15	F20260106.S16	F20260106.S17	F20260106.S18	F20260106.S19	F20260106.S20	F20260106.S21	F20260106.S22	F20260106.S23	F20260106.S24	F20260106.S25	F20260106.S26	F20260106.S27	F20260106.S28	F20260106.S29	F20260106.S30	F20260106.S31	F20260106.S32	F20260106.S33	F20260106.S34	F20260106.S35	F20260106.S36	F20260106.S37	F20260106.S38	F20260106.S39	F20260106.S40	F20260106.S41	F20260106.S42	F20260106.S43	F20260106.S44	F20260106.S45	F20260106.S46	F20260106.S47	F20260106.S48	F20260106.S49	F20260106.S50	F20260106.S51	F20260106.S52	F20260106.S53	F20260106.S54	F20260106.S55	F20260106.S56	F20260106.S57	F20260106.S58	F20260106.S59	F20260106.S60	F20260106.S61	F20260106.S62	F20260106.S63	F20260106.S64	F20260106.S65	F20260106.S66	F20260106.S67	F20260106.S68	F20260106.S69	F20260106.S70	F20260106.S71	F20260106.S72	F20260106.S73	F20260106.S74	F20260106.S75	F20260106.S76	F20260106.S77	F20260106.S78	F20260106.S79	F20260106.S80	F20260106.S81	F20260106.S82	F20260106.S83	F20260106.S84	Row Sum	Percentage
input	973,897	692,043	699,735	684,007	689,941	764,828	656,834	644,759	802,545	694,464	746,302	632,638	756,283	690,560	685,024	607,491	615,570	656,646	671,484	795,654	696,244	616,904	669,365	692,519	782,743	704,896	700,915	606,420	663,751	631,203	637,172	619,843	703,857	736,111	715,257	753,876	755,366	554,932	682,809	641,272	593,204	639,380	703,451	640,450	601,568	619,148	662,786	665,009	908,764	787,914	507,674	603,981	668,045	690,476	660,473	698,861	644,875	730,522	590,677	595,546	784,211	665,585	599,024	706,375	559,595	693,126	704,086	599,869	592,842	713,038	711,980	665,722	839,200	586,994	691,716	730,804	753,261	839,559	785,950	841,852	907,553	874,880	838,181	896,928	58,521,295	100.00%
filtered	973,651	691,880	699,594	683,833	689,767	764,629	656,651	644,561	802,344	694,287	746,120	632,471	756,066	690,406	684,844	607,325	615,405	656,475	671,282	795,491	696,076	616,746	669,195	692,371	782,540	704,716	700,730	606,285	663,576	631,044	637,027	619,695	703,670	735,937	715,064	753,675	755,166	554,790	682,649	641,090	593,034	639,251	703,245	640,294	601,427	619,009	662,601	664,857	908,543	787,728	507,523	603,821	667,854	690,294	660,282	698,697	644,709	730,355	590,512	595,389	784,000	665,373	598,831	706,185	559,464	692,915	703,920	599,697	592,695	712,841	711,803	665,538	838,961	586,808	691,520	730,636	753,079	839,342	785,752	841,653	907,314	874,655	837,952	896,705	58,506,188	99.97%
denoisedF	968,574	688,794	694,501	675,438	687,018	761,883	649,413	636,411	799,093	685,738	743,484	630,630	747,062	687,115	677,432	598,163	608,070	651,419	665,797	793,065	693,382	610,495	664,832	690,754	774,329	702,275	695,603	604,430	656,366	626,790	630,085	612,088	700,291	730,314	707,665	746,121	746,507	547,593	674,867	634,369	585,633	632,541	694,551	633,746	598,712	613,875	656,530	662,489	905,211	778,673	500,603	600,858	659,227	687,195	651,743	695,109	635,844	721,759	583,129	589,971	774,085	657,669	591,031	699,977	552,491	684,167	697,378	592,466	586,532	704,484	703,640	655,964	826,324	578,469	684,675	726,692	748,523	828,837	780,168	836,893	897,204	864,934	825,546	887,742	57,971,546	99.06%
denoisedR	955,610	680,025	681,420	663,029	677,893	751,991	636,686	623,372	788,922	671,198	734,437	622,568	731,961	677,431	662,988	584,947	595,334	640,916	653,382	783,409	685,494	597,415	652,964	682,897	758,437	693,604	685,973	597,116	643,506	616,793	618,427	600,246	690,449	718,239	693,530	731,931	731,006	536,635	660,908	622,219	574,317	620,654	680,816	621,436	590,891	603,159	644,374	652,832	893,950	762,582	490,036	591,610	645,297	678,406	637,228	686,237	622,207	707,419	571,557	578,965	757,155	645,069	577,634	687,098	540,562	668,594	684,050	580,223	575,769	690,102	689,889	642,407	806,065	565,231	671,557	717,805	738,405	811,677	767,418	825,430	878,431	848,883	806,687	870,160	56,935,582	97.29%
merged	900,680	649,713	614,294	587,737	648,701	722,476	572,946	552,472	756,231	586,139	709,868	598,553	651,228	641,674	595,314	506,583	524,017	597,070	591,830	756,247	666,341	536,773	605,862	665,974	689,239	671,630	650,061	577,118	585,496	577,539	547,932	534,258	657,748	667,538	607,367	646,633	643,788	471,484	578,657	551,537	498,183	559,408	607,014	556,607	569,129	555,864	593,502	624,679	862,606	664,320	431,779	548,524	565,823	647,245	547,584	652,197	543,354	620,403	509,276	525,882	650,655	555,644	488,802	629,902	463,120	592,138	616,309	499,155	513,354	598,684	602,732	556,210	695,483	494,855	608,995	681,980	691,520	701,847	695,004	776,721	775,531	751,811	683,937	752,137	51,656,653	88.27%
nonchim	515,324	406,176	266,627	285,711	427,203	472,317	311,540	283,772	490,307	264,540	465,097	415,772	331,489	409,319	291,955	260,232	258,472	356,659	266,090	508,889	480,728	272,510	329,583	449,083	412,653	448,342	454,766	378,651	343,209	318,112	249,754	255,588	408,250	378,661	272,018	276,508	300,739	235,863	251,589	256,854	200,414	322,061	282,576	279,294	397,934	314,321	327,671	405,204	567,795	279,242	193,692	281,576	285,870	408,125	221,371	388,836	276,640	290,170	276,941	300,080	261,165	213,705	184,927	348,042	214,409	318,134	334,647	192,827	261,316	223,381	267,533	253,775	323,751	229,959	322,698	385,558	371,225	327,205	338,549	468,953	373,414	362,384	317,522	285,508	27,541,352	47.06%

This table can be downloaded as an Excel table below:

5. DADA2 Amplicon Sequence Variants (ASVs). A total of 8775 unique merged and chimera-free ASV sequences were identified, and their corresponding read counts for each sample are available in the "ASV Read Count Table" with rows for the ASV sequences and columns for sample. This read count table can be used for microbial profile comparison among different samples and the sequences provided in the table can be used to taxonomy assignment.

The table can be downloaded from this link:

Sample Meta Information

Download Sample Meta Information

#SampleID	SampleName	Phenotype	Time	Sex	SubjectID	Group	Phenotype_Time_Sex
F20260106.S01	F20260106.S01	Control	0	F	2813	Control_0	Control_0_F
F20260106.S02	F20260106.S02	Control	0	F	2836	Control_0	Control_0_F
F20260106.S03	F20260106.S03	Control	0	F	2838	Control_0	Control_0_F
F20260106.S04	F20260106.S04	Control	0	F	2845	Control_0	Control_0_F
F20260106.S05	F20260106.S05	NR	0	F	2814	NR_0	NR_0_F
F20260106.S06	F20260106.S06	NR	0	F	2815	NR_0	NR_0_F
F20260106.S07	F20260106.S07	NR	0	F	2823	NR_0	NR_0_F
F20260106.S08	F20260106.S08	NR	0	F	2824	NR_0	NR_0_F
F20260106.S09	F20260106.S09	NR	0	F	2834	NR_0	NR_0_F
F20260106.S10	F20260106.S10	NR	0	F	2839	NR_0	NR_0_F
F20260106.S11	F20260106.S11	NR	0	F	2840	NR_0	NR_0_F
F20260106.S12	F20260106.S12	NR	0	F	2842	NR_0	NR_0_F
F20260106.S13	F20260106.S13	PTSD	0	F	2835	PTSD_0	PTSD_0_F
F20260106.S14	F20260106.S14	PTSD	0	F	2841	PTSD_0	PTSD_0_F
F20260106.S15	F20260106.S15	PTSD	0	F	2843	PTSD_0	PTSD_0_F
F20260106.S16	F20260106.S16	PTSD	0	F	2844	PTSD_0	PTSD_0_F
F20260106.S17	F20260106.S17	Control	4	F	2813	Control_4	Control_4_F
F20260106.S18	F20260106.S18	Control	4	F	2836	Control_4	Control_4_F
F20260106.S19	F20260106.S19	Control	4	F	2838	Control_4	Control_4_F
F20260106.S20	F20260106.S20	Control	4	F	2845	Control_4	Control_4_F
F20260106.S21	F20260106.S21	NR	4	F	2814	NR_4	NR_4_F
F20260106.S22	F20260106.S22	NR	4	F	2815	NR_4	NR_4_F
F20260106.S23	F20260106.S23	NR	4	F	2823	NR_4	NR_4_F
F20260106.S24	F20260106.S24	NR	4	F	2824	NR_4	NR_4_F
F20260106.S25	F20260106.S25	NR	4	F	2834	NR_4	NR_4_F
F20260106.S26	F20260106.S26	NR	4	F	2839	NR_4	NR_4_F
F20260106.S27	F20260106.S27	NR	4	F	2840	NR_4	NR_4_F
F20260106.S28	F20260106.S28	NR	4	F	2842	NR_4	NR_4_F
F20260106.S29	F20260106.S29	PTSD	4	F	2835	PTSD_4	PTSD_4_F
F20260106.S30	F20260106.S30	PTSD	4	F	2841	PTSD_4	PTSD_4_F
F20260106.S31	F20260106.S31	PTSD	4	F	2843	PTSD_4	PTSD_4_F
F20260106.S32	F20260106.S32	PTSD	4	F	2844	PTSD_4	PTSD_4_F
F20260106.S33	F20260106.S33	Control	8	F	2813	Control_8	Control_8_F
F20260106.S34	F20260106.S34	Control	8	F	2836	Control_8	Control_8_F
F20260106.S35	F20260106.S35	Control	8	F	2838	Control_8	Control_8_F
F20260106.S36	F20260106.S36	Control	8	F	2845	Control_8	Control_8_F
F20260106.S37	F20260106.S37	NR	8	F	2814	NR_8	NR_8_F
F20260106.S38	F20260106.S38	NR	8	F	2815	NR_8	NR_8_F
F20260106.S39	F20260106.S39	NR	8	F	2823	NR_8	NR_8_F
F20260106.S40	F20260106.S40	NR	8	F	2824	NR_8	NR_8_F
F20260106.S41	F20260106.S41	NR	8	F	2834	NR_8	NR_8_F
F20260106.S42	F20260106.S42	NR	8	F	2839	NR_8	NR_8_F
F20260106.S43	F20260106.S43	NR	8	F	2840	NR_8	NR_8_F
F20260106.S44	F20260106.S44	NR	8	F	2842	NR_8	NR_8_F
F20260106.S45	F20260106.S45	PTSD	8	F	2835	PTSD_8	PTSD_8_F
F20260106.S46	F20260106.S46	PTSD	8	F	2841	PTSD_8	PTSD_8_F
F20260106.S47	F20260106.S47	PTSD	8	F	2843	PTSD_8	PTSD_8_F
F20260106.S48	F20260106.S48	PTSD	8	F	2844	PTSD_8	PTSD_8_F
F20260106.S49	F20260106.S49	Control	0	M	2816	Control_0	Control_0_M
F20260106.S50	F20260106.S50	Control	0	M	2821	Control_0	Control_0_M
F20260106.S51	F20260106.S51	Control	0	M	2832	Control_0	Control_0_M
F20260106.S52	F20260106.S52	Control	0	M	2851	Control_0	Control_0_M
F20260106.S53	F20260106.S53	NR	0	M	2817	NR_0	NR_0_M
F20260106.S54	F20260106.S54	NR	0	M	2818	NR_0	NR_0_M
F20260106.S55	F20260106.S55	NR	0	M	2819	NR_0	NR_0_M
F20260106.S56	F20260106.S56	NR	0	M	2820	NR_0	NR_0_M
F20260106.S57	F20260106.S57	PTSD	0	M	2827	PTSD_0	PTSD_0_M
F20260106.S58	F20260106.S58	PTSD	0	M	2833	PTSD_0	PTSD_0_M
F20260106.S59	F20260106.S59	PTSD	0	M	2828	PTSD_0	PTSD_0_M
F20260106.S60	F20260106.S60	PTSD	0	M	2850	PTSD_0	PTSD_0_M
F20260106.S61	F20260106.S61	Control	4	M	2816	Control_4	Control_4_M
F20260106.S62	F20260106.S62	Control	4	M	2821	Control_4	Control_4_M
F20260106.S63	F20260106.S63	Control	4	M	2832	Control_4	Control_4_M
F20260106.S64	F20260106.S64	Control	4	M	2851	Control_4	Control_4_M
F20260106.S65	F20260106.S65	NR	4	M	2817	NR_4	NR_4_M
F20260106.S66	F20260106.S66	NR	4	M	2818	NR_4	NR_4_M
F20260106.S67	F20260106.S67	NR	4	M	2819	NR_4	NR_4_M
F20260106.S68	F20260106.S68	NR	4	M	2820	NR_4	NR_4_M
F20260106.S69	F20260106.S69	PTSD	4	M	2827	PTSD_4	PTSD_4_M
F20260106.S70	F20260106.S70	PTSD	4	M	2833	PTSD_4	PTSD_4_M
F20260106.S71	F20260106.S71	PTSD	4	M	2828	PTSD_4	PTSD_4_M
F20260106.S72	F20260106.S72	PTSD	4	M	2850	PTSD_4	PTSD_4_M
F20260106.S73	F20260106.S73	Control	8	M	2816	Control_8	Control_8_M
F20260106.S74	F20260106.S74	Control	8	M	2821	Control_8	Control_8_M
F20260106.S75	F20260106.S75	Control	8	M	2832	Control_8	Control_8_M
F20260106.S76	F20260106.S76	Control	8	M	2851	Control_8	Control_8_M
F20260106.S77	F20260106.S77	NR	8	M	2817	NR_8	NR_8_M
F20260106.S78	F20260106.S78	NR	8	M	2818	NR_8	NR_8_M
F20260106.S79	F20260106.S79	NR	8	M	2819	NR_8	NR_8_M
F20260106.S80	F20260106.S80	NR	8	M	2820	NR_8	NR_8_M
F20260106.S81	F20260106.S81	PTSD	8	M	2827	PTSD_8	PTSD_8_M
F20260106.S82	F20260106.S82	PTSD	8	M	2833	PTSD_8	PTSD_8_M
F20260106.S83	F20260106.S83	PTSD	8	M	2828	PTSD_8	PTSD_8_M
F20260106.S84	F20260106.S84	PTSD	8	M	2850	PTSD_8	PTSD_8_M

ASV Read Counts by Samples

#Sample ID	Read Count
F20260106.S63	184,927
F20260106.S68	192,827
F20260106.S51	193,692
F20260106.S41	200,414
F20260106.S62	213,705
F20260106.S65	214,409
F20260106.S55	221,371
F20260106.S70	223,381
F20260106.S74	229,959
F20260106.S38	235,863
F20260106.S31	249,754
F20260106.S39	251,589
F20260106.S72	253,775
F20260106.S32	255,588
F20260106.S40	256,854
F20260106.S17	258,472
F20260106.S16	260,232
F20260106.S61	261,165
F20260106.S69	261,316
F20260106.S10	264,540
F20260106.S19	266,090
F20260106.S03	266,627
F20260106.S71	267,533
F20260106.S35	272,018
F20260106.S22	272,510
F20260106.S36	276,508
F20260106.S57	276,640
F20260106.S59	276,941
F20260106.S50	279,242
F20260106.S44	279,294
F20260106.S52	281,576
F20260106.S43	282,576
F20260106.S08	283,772
F20260106.S84	285,508
F20260106.S04	285,711
F20260106.S53	285,870
F20260106.S58	290,170
F20260106.S15	291,955
F20260106.S60	300,080
F20260106.S37	300,739
F20260106.S07	311,540
F20260106.S46	314,321
F20260106.S83	317,522
F20260106.S30	318,112
F20260106.S66	318,134
F20260106.S42	322,061
F20260106.S75	322,698
F20260106.S73	323,751
F20260106.S78	327,205
F20260106.S47	327,671
F20260106.S23	329,583
F20260106.S13	331,489
F20260106.S67	334,647
F20260106.S79	338,549
F20260106.S29	343,209
F20260106.S64	348,042
F20260106.S18	356,659
F20260106.S82	362,384
F20260106.S77	371,225
F20260106.S81	373,414
F20260106.S28	378,651
F20260106.S34	378,661
F20260106.S76	385,558
F20260106.S56	388,836
F20260106.S45	397,934
F20260106.S48	405,204
F20260106.S02	406,176
F20260106.S54	408,125
F20260106.S33	408,250
F20260106.S14	409,319
F20260106.S25	412,653
F20260106.S12	415,772
F20260106.S05	427,203
F20260106.S26	448,342
F20260106.S24	449,083
F20260106.S27	454,766
F20260106.S11	465,097
F20260106.S80	468,953
F20260106.S06	472,317
F20260106.S21	480,728
F20260106.S09	490,307
F20260106.S20	508,889
F20260106.S01	515,324
F20260106.S49	567,795

VII. Analysis - Read Taxonomy Assignment

Read Taxonomy Assignment - Methods

The close-reference taxonomy assignment of the ASV sequences using BLASTN is based on the algorithm published by Al-Hebshi et. al. (2015)[2].

The species-level, open-reference 16S rRNA NGS reads taxonomy assignment pipeline

Version 20210310a

1. Raw sequences reads in FASTA format were BLASTN-searched against a combined set of 16S rRNA reference sequences - the FOMC 16S rRNA Reference Sequences version 20221029 (https://microbiome.forsyth.org/ftp/refseq/). This set consists of the HOMD (version 15.22 http://www.homd.org/index.php?name=seqDownload&file&type=R ), Mouse Oral Microbiome Database (MOMD version 5.1 https://momd.org/ftp/16S_rRNA_refseq/MOMD_16S_rRNA_RefSeq/V5.1/), and the NCBI 16S rRNA reference sequence set (https://ftp.ncbi.nlm.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz). These sequences were screened and combined to remove short sequences (<1000nt), chimera, duplicated and sub-sequences, as well as sequences with poor taxonomy annotation (e.g., without species information). This process resulted in 1,015 full-length 16S rRNA sequences from HOMD V15.22, 356 from MOMD V5.1, and 22,126 from NCBI, a total of 23,497 sequences. Altogether these sequence represent a total of 17,035 oral and non-oral microbial species.

The NCBI BLASTN version 2.7.1+ (Zhang et al, 2000) [3] was used with the default parameters. Reads with ≥ 98% sequence identity to the matched reference and ≥ 90% alignment length (i.e., ≥ 90% of the read length that was aligned to the reference and was used to calculate the sequence percent identity) were classified based on the taxonomy of the reference sequence with highest sequence identity. If a read matched with reference sequences representing more than one species with equal percent identity and alignment length, it was subject to chimera checking with USEARCH program version v8.1.1861 (Edgar 2010). Non-chimeric reads with multi-species best hits were considered valid and were assigned with a unique species notation (e.g., spp) denoting unresolvable multiple species.

2. Unassigned reads (i.e., reads with < 98% identity or < 90% alignment length) were pooled together and reads < 200 bases were removed. The remaining reads were subject to the de novo operational taxonomy unit (OTU) calling and chimera checking using the USEARCH program version v8.1.1861 (Edgar 2010)[4]. The de novo OTU calling and chimera checking was done using 98% as the sequence identity cutoff, i.e., the species-level OTU. The output of this step produced species-level de novo clustered OTUs with 98% identity. Representative reads from each of the OTUs/species were then BLASTN-searched against the same reference sequence set again to determine the closest species for these potential novel species. These potential novel species were pooled together with the reads that were signed to specie-level in the previous step, for down-stream analyses.

Reference:

Al-Hebshi NN, Nasher AT, Idris AM, Chen T. Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples. J Oral Microbiol. 2015 Sep 29;7:28934. doi: 10.3402/jom.v7.28934. PMID: 26426306; PMCID: PMC4590409.
Zhang Z, Schwartz S, Wagner L, Miller W. A greedy algorithm for aligning DNA sequences. J Comput Biol. 2000 Feb-Apr;7(1-2):203-14. doi: 10.1089/10665270050081478. PMID: 10890397.
Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010 Oct 1;26(19):2460-1. doi: 10.1093/bioinformatics/btq461. Epub 2010 Aug 12. PubMed PMID: 20709691.

3. Designations used in the taxonomy:

	1) Taxonomy levels are indicated by these prefixes:
	
	   k__: domain/kingdom
	   p__: phylum
	   c__: class
	   o__: order
	   f__: family
	   g__: genus  
	   s__: species
	
	   Example: 
	
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Blautia;s__faecis
		
	2) Unique level identified – known species:
	   
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__hominis
	
	   The above example shows some reads match to a single species (all levels are unique)
	
	3) Non-unique level identified – known species:

	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__multispecies_spp123_3
	   
	   The above example “s__multispecies_spp123_3” indicates certain reads equally match to 3 species of the 
	   genus Roseburia; the “spp123” is a temporally assigned species ID.
	
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__multigenus;s__multispecies_spp234_5
	   
	   The above example indicates certain reads match equally to 5 different species, which belong to multiple genera.; 
	   the “spp234” is a temporally assigned species ID.
	
	4) Unique level identified – unknown species, potential novel species:
	   
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__ hominis_nov_97%
	   
	   The above example indicates that some reads have no match to any of the reference sequences with 
	   sequence identity ≥ 98% and percent coverage (alignment length)  ≥ 98% as well. However this groups 
	   of reads (actually the representative read from a de novo  OTU) has 96% percent identity to 
	   Roseburia hominis, thus this is a potential novel species, closest to Roseburia hominis. 
	   (But they are not the same species).
	
	5) Multiple level identified – unknown species, potential novel species:
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__ multispecies_sppn123_3_nov_96%
	
	   The above example indicates that some reads have no match to any of the reference sequences 
	   with sequence identity ≥ 98% and percent coverage (alignment length)  ≥ 98% as well. 
	   However this groups of reads (actually the representative read from a de novo  OTU) 
	   has 96% percent identity equally to 3 species in Roseburia. Thus this is no single 
	   closest species, instead this group of reads match equally to multiple species at 96%. 
	   Since they have passed chimera check so they represent a novel species. “sppn123” is a 
	   temporary ID for this potential novel species.

4. The taxonomy assignment algorithm is illustrated in this flow char below:

Read Taxonomy Assignment - Result Summary *

Code	Category	MPC=0% (>=1 read)	MPC=0.01%(>=1381 reads)
A	Total reads	27,541,352	27,541,352
B	Total assigned reads	13,815,404	13,815,404
C	Assigned reads in species with read count < MPC	0	5,651
D	Assigned reads in samples with read count < 500	0	0
E	Total samples	84	84
F	Samples with reads >= 500	84	84
G	Samples with reads < 500	0	0
H	Total assigned reads used for analysis (B-C-D)	13,815,404	13,809,753
I	Reads assigned to single species	9,596,314	9,592,367
J	Reads assigned to multiple species	4,219,090	4,217,386
K	Reads assigned to novel species	0	0
L	Total number of species	84	46
M	Number of single species	57	36
N	Number of multi-species	27	10
O	Number of novel species	0	0
P	Total unassigned reads	13,725,948	13,725,948
Q	Chimeric reads	0	0
R	Reads without BLASTN hits	2	2
S	Others: short, low quality, singletons, etc.	13,725,946	13,725,946
	A=B+P=C+D+H+Q+R+S
	E=F+G
	B=C+D+H
	H=I+J+K
	L=M+N+O
	P=Q+R+S

* MPC = Minimal percent (of all assigned reads) read count per species, species with read count < MPC were removed.

* Samples with reads < 500 were removed from downstream analyses.

* The assignment result from MPC=0.1% was used in the downstream analyses.

Read Taxonomy Assignment - ASV Species-Level Read Counts Table

This table shows the read counts for each sample (columns) and each species identified based on the ASV sequences. The downstream analyses were based on this table.

SPID	Taxonomy	F20260106.S01	F20260106.S02	F20260106.S03	F20260106.S04	F20260106.S05	F20260106.S06	F20260106.S07	F20260106.S08	F20260106.S09	F20260106.S10	F20260106.S11	F20260106.S12	F20260106.S13	F20260106.S14	F20260106.S15	F20260106.S16	F20260106.S17	F20260106.S18	F20260106.S19	F20260106.S20	F20260106.S21	F20260106.S22	F20260106.S23	F20260106.S24	F20260106.S25	F20260106.S26	F20260106.S27	F20260106.S28	F20260106.S29	F20260106.S30	F20260106.S31	F20260106.S32	F20260106.S33	F20260106.S34	F20260106.S35	F20260106.S36	F20260106.S37	F20260106.S38	F20260106.S39	F20260106.S40	F20260106.S41	F20260106.S42	F20260106.S43	F20260106.S44	F20260106.S45	F20260106.S46	F20260106.S47	F20260106.S48	F20260106.S49	F20260106.S50	F20260106.S51	F20260106.S52	F20260106.S53	F20260106.S54	F20260106.S55	F20260106.S56	F20260106.S57	F20260106.S58	F20260106.S59	F20260106.S60	F20260106.S61	F20260106.S62	F20260106.S63	F20260106.S64	F20260106.S65	F20260106.S66	F20260106.S67	F20260106.S68	F20260106.S69	F20260106.S70	F20260106.S71	F20260106.S72	F20260106.S73	F20260106.S74	F20260106.S75	F20260106.S76	F20260106.S77	F20260106.S78	F20260106.S79	F20260106.S80	F20260106.S81	F20260106.S82	F20260106.S83	F20260106.S84
SP104	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-14];bacterium_MOT-183	0	0	134	39	0	0	59	1032	0	1102	0	0	330	0	36	0	268	309	117	0	0	296	13	0	57	0	0	0	596	11	38	467	0	0	218	270	829	2101	538	124	49	1163	58	98	0	179	160	0	0	19	15	0	84	0	260	0	368	128	1126	7	82	0	66	0	584	582	342	181	749	92	257	207	193	52	106	0	0	302	0	173	5635	4771	3869	617
SP110	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-11];bacterium_MOT-177	0	0	761	2183	0	0	3335	12416	0	8483	0	0	1206	0	1825	1878	1393	205	358	0	0	652	15	0	428	0	0	0	4090	181	2344	4776	0	1734	3545	582	8746	11712	3158	2804	1941	6199	2023	1363	6	347	671	0	0	11	82	0	43	0	662	0	1145	2191	1172	461	3694	3443	2185	5752	6889	9633	1950	263	6280	1181	2469	895	296	118	704	0	0	639	0	0	225	170	3746	128
SP169	Bacteria;Firmicutes;Clostridia;Eubacteriales;Eubacteriales_[F-1];Eubacteriales_[G-4];bacterium_MOT-165	0	0	62	82	0	0	22	163	0	28	0	0	55	0	78	135	86	35	212	0	0	79	79	0	197	0	168	0	23	125	190	143	0	0	37	40	131	110	79	71	29	26	92	119	49	104	135	0	0	407	86	0	182	0	102	0	19	80	176	53	119	27	44	0	34	0	26	22	28	5	63	61	241	94	78	0	0	172	95	0	0	0	0	0
SP18	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnoclostridium;[Clostridium] scindens	0	0	208	677	0	0	442	1087	0	719	0	0	688	0	557	844	689	170	1245	0	0	312	141	0	811	0	203	0	734	0	380	411	0	198	386	350	710	694	309	540	244	415	371	148	119	100	205	0	0	546	247	0	477	0	386	0	393	265	328	383	454	221	142	1199	489	490	183	171	456	129	330	440	722	224	122	0	0	451	52	1301	377	240	1275	184
SP19	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Muribaculaceae;Muribaculaceae_[G-1];bacterium_MOT-129	117	1110	1710	961	37778	14587	13276	13474	342	14789	2268	15	2649	2258	2115	2092	3543	1808	74	94	1105	6939	6866	4434	93	74331	519	1097	9845	1381	0	1991	1743	11836	3895	4371	5157	2235	2267	8270	7221	2984	1017	3148	685	1849	1502	1854	470	12994	714	602	1347	104	1514	503	320	8101	217	2484	436	28030	2412	3514	1604	9805	13521	684	100	16649	438	312	1265	252	1359	236	793	4969	24680	8583	1410	1467	2086	3363
SP195	Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Erysipelotrichaceae;Ileibacterium;valens	0	0	456	335	0	0	2	0	0	231	0	14905	203	360	8402	35	41	118556	9	0	8	39006	151009	3	20804	0	36691	0	2597	3139	167	911	0	0	0	1325	0	60	5063	4246	368	3258	1555	6583	185215	37635	29261	4	0	8089	14543	2	4420	3	7872	0	319	1885	248	25669	0	0	0	0	0	5563	6799	16145	1282	1410	2362	481	19259	13121	43072	0	0	8724	103965	60	2956	2498	250	1244
SP227	Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Erysipelotrichaceae;Erysipelatoclostridium;[Clostridium] saccharogumia	0	0	3104	296	0	0	93	281	0	0	0	23	0	0	504	413	187	345	0	0	0	662	339	0	150	0	0	0	70	326	195	782	0	99	278	0	223	0	0	373	2409	0	411	679	155	148	84	0	0	1045	115	0	164	0	0	0	127	173	191	580	497	67	106	523	167	0	0	724	159	0	163	642	409	80	73	0	0	259	421	0	0	0	88	65
SP23	Bacteria;Firmicutes;Clostridia;Eubacteriales;Oscillospiraceae;Oscillospiraceae_[G-2];bacterium_MOT-149	0	0	18	58	0	0	174	0	0	182	0	0	362	0	33	74	623	54	35	0	0	149	40	0	329	0	35	0	260	147	106	74	0	104	13	95	0	184	128	21	21	194	37	16	3	75	174	0	0	130	84	0	75	0	146	0	221	105	69	20	220	99	69	0	0	154	77	0	19	48	16	23	168	75	74	0	0	164	29	0	59	35	799	131
SP247	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnoclostridium;pacaense	0	0	251	926	0	0	526	1428	0	921	0	0	839	0	648	810	1220	249	82	0	0	501	84	0	641	0	82	0	2180	93	686	941	0	0	534	636	978	1621	1324	1195	328	2117	626	298	0	139	182	0	0	261	323	0	448	0	805	0	1459	608	1571	72	576	482	594	295	1041	1081	467	642	2501	371	820	745	928	279	228	0	0	335	0	0	498	329	1175	268
SP254	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-12];bacterium_MOT-179	0	0	52	399	0	0	204	1878	0	575	0	0	375	0	382	732	457	113	37	0	0	256	68	0	209	0	23	0	1564	46	381	407	0	442	557	275	659	728	444	283	188	1117	666	272	0	92	95	0	0	22	125	0	179	0	294	0	877	262	596	13	121	193	190	120	724	1001	161	99	417	144	297	298	439	111	119	0	0	249	0	0	361	238	643	144
SP28	Bacteria;Firmicutes;Clostridia;Eubacteriales;Eubacteriales_[F-1];Eubacteriales_[G-2];bacterium_MOT-162	0	0	0	36	0	0	83	0	0	0	0	164	0	0	1348	474	5265	482	21178	0	0	151	1187	0	1535	0	18799	0	177	661	2731	289	0	0	0	0	0	0	0	206	339	0	381	202	356	242	303	0	0	3834	299	0	315	0	0	0	253	215	119	791	0	0	425	0	0	0	0	93	51	0	113	251	683	16	1419	0	0	199	272	0	0	0	27	96
SP29	Bacteria;Firmicutes;Clostridia;Eubacteriales;Eubacteriales_[F-1];Eubacteriales_[G-3];bacterium_MOT-163	0	0	0	0	0	0	299	294	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	35	0	301	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	274	22	126	0	78	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
SP30	Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;muciniphila	176503	111038	2448	0	88151	141368	106310	14373	108128	4413	157649	160385	66825	156499	0	732	5136	78629	8437	183281	230211	4396	945	107635	168082	103093	254805	114258	1219	27956	260	6869	152175	151247	19076	7332	1885	4109	5378	0	0	1309	603	67085	120781	94580	80309	160586	242317	8904	1395	89015	108289	181801	247	121110	28709	38252	367	139933	61691	232	746	172589	192	68314	158705	78	0	2729	4173	35253	94417	67493	116305	165276	131203	88617	50595	235787	118478	120403	92	0
SP31	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-11];bacterium_MOT-178	0	0	114	0	0	0	0	0	0	18	0	0	216	0	149	703	48	423	42	0	0	328	349	0	11	0	0	0	16	140	12	1914	0	0	101	0	311	18	0	72	0	23	362	0	0	1071	471	0	0	0	395	0	12899	14	185	0	39	465	332	1553	111	0	81	0	269	373	187	0	95	175	414	0	2964	575	5695	0	0	297	142	0	4080	3803	86	0
SP35	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Muribaculaceae;Muribaculum;intestinale	802	6763	2690	3599	9981	7363	3331	2787	5607	2424	337	65	3055	8118	9596	3738	3321	3326	1367	825	2326	3368	3081	0	1006	9002	1319	0	1420	576	3671	2812	10955	4041	1650	3787	4125	1224	3575	6009	1064	3108	3019	3257	645	2282	2220	5623	8296	2057	3875	6339	2817	2664	1077	3479	2365	2609	949	3708	2780	4272	2408	13921	2200	1542	2578	3046	617	3181	2164	1925	1148	1070	1066	5269	4332	3487	2276	7040	3897	4721	3952	5439
SP45	Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Turicibacteraceae;Turicibacter;sanguinis	0	0	0	0	0	0	0	0	0	0	0	1031	0	265	28	61	0	161	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	6	0	0	0	0	0	184	93	0	4111	8173	2344	4778	6383	0	0	0	0	0	0	0	0	0	8	74	18	11212	17	613	52	1291	0	38	601	3977	7	222	403	607	74	3557	2493	0	0	0	0	0	0	0	0	0
SP46	Bacteria;Firmicutes;Clostridia;Eubacteriales;Oscillospiraceae;Acutalibacter;muris	0	0	14	184	0	0	48	110	0	106	0	0	125	0	50	234	401	10	66	0	0	41	0	0	451	0	5	0	136	71	146	112	0	27	85	57	29	128	65	88	95	156	142	15	0	15	0	0	0	101	22	0	155	0	93	0	106	81	92	0	33	27	42	0	18	129	23	22	75	13	83	62	77	121	22	0	0	41	0	0	55	50	56	9
SP49	Bacteria;Firmicutes;Clostridia;Eubacteriales;Oscillospiraceae;Oscillospiraceae_[G-4];bacterium_MOT-151	0	0	514	588	0	0	348	426	0	571	0	26	452	0	1518	1161	287	247	1816	0	0	882	279	0	297	0	283	0	234	338	298	1091	0	391	498	582	2206	416	627	830	279	288	951	786	216	955	557	0	0	3362	624	0	1127	0	716	0	437	941	343	3290	1530	580	442	386	634	291	579	331	274	485	694	684	611	438	876	0	0	1015	227	0	417	385	825	433
SP5	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-1];bacterium_MOT-166	0	0	130	465	0	0	166	210	0	0	0	0	0	0	92	1505	0	0	0	0	0	0	0	0	802	0	29	0	175	0	0	232	0	132	84	0	148	0	85	503	258	0	190	87	0	24	8	0	0	0	0	0	0	0	241	0	167	149	57	0	29	183	43	26	87	359	48	32	336	0	55	249	457	78	104	0	0	0	0	0	4	4	678	248
SP50	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-5];bacterium_MOT-170	0	0	83	164	0	0	86	188	0	0	0	6	29	0	106	61	332	36	20	0	0	0	0	0	36	0	5	0	24	0	136	0	0	149	111	0	123	0	30	59	148	8	38	52	0	0	0	0	0	31	0	0	14	0	0	0	20	26	0	14	136	218	51	150	69	0	0	29	41	0	13	51	0	0	0	0	0	18	0	0	0	0	0	0
SP54	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Anaerostipes;caccae	334	309	0	0	0	0	0	0	220	0	244	53	0	310	0	0	0	0	0	0	50	0	0	167	0	314	3	146	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	119	343	0	0	188	0	453	0	402	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	400	500	0	0	155	17	63	0	0
SP55	Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Erysipelotrichaceae;Erysipelotrichaceae_[G-1];bacterium_MOT-189	44003	0	1012	412	152	146	17277	0	9	2707	0	22133	711	2546	2890	23	106	12500	0	0	0	2990	17663	0	5495	0	15059	12	3733	2654	1271	3248	119	18190	5809	484	2572	30	1491	545	697	969	271	836	9785	2208	1445	7	16	1190	3025	25	51	4	317	50	123	296	0	255	3422	14107	22968	11888	3974	1600	2059	1793	298	790	447	152	896	1601	1376	344	2581	3069	40303	878	52109	43592	7628	52695
SP6	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Anaerotignum;lactatifermentans	0	0	13	77	0	0	92	58	0	71	0	0	102	0	126	206	15	25	6	0	0	63	13	0	41	0	2	0	60	0	31	75	0	0	15	45	76	109	41	60	23	69	66	56	0	25	54	0	0	10	39	0	28	0	33	0	63	44	29	34	24	0	29	0	71	147	61	31	52	14	42	89	118	50	47	0	0	55	10	0	18	19	80	0
SP64	Bacteria;Firmicutes;Clostridia;Eubacteriales;Oscillospiraceae;Oscillospiraceae_[G-6];bacterium_MOT-153	0	0	37	156	0	0	0	0	0	50	0	0	26	0	14	8	22	0	142	0	0	48	8	0	59	0	20	0	28	171	45	22	0	0	0	55	0	58	0	10	11	46	19	76	0	44	129	0	0	19	0	0	278	0	56	0	23	28	78	0	0	0	0	0	0	0	45	0	0	21	39	24	35	29	31	0	0	49	4	0	22	0	250	31
SP65	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Acetatifactor;muris	0	0	91	337	0	0	90	250	0	139	0	0	148	0	122	358	280	21	0	0	0	133	0	0	109	0	0	0	401	0	33	209	0	0	11	83	125	177	54	156	51	222	152	112	0	8	49	0	0	72	66	0	145	2	117	0	161	89	204	0	88	81	61	0	175	197	43	29	247	93	75	116	248	88	78	0	0	194	0	0	0	0	184	34
SP7	Bacteria;Firmicutes;Clostridia;Eubacteriales;Peptostreptococcaceae;Romboutsia;timonensis	0	0	0	0	0	0	0	0	0	0	0	194	0	0	76	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	298	145	0	406	1081	0	0	0	0	0	2	0	0	0	0	0	0	115	38	0	493	0	0	0	0	0	53	234	258	19	49	119	35	0	0	0	0	0	0	0	0	0	0	0	0
SP70	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-11];bacterium_MOT-176	0	0	15	731	0	0	366	561	0	714	0	0	1363	0	817	1646	1254	262	21	0	0	624	35	0	673	0	10	0	196	75	15	383	0	80	211	170	1028	1069	131	129	21	869	653	32	0	169	175	0	0	200	973	0	569	0	201	0	1061	32	612	65	30	20	111	665	229	458	157	54	369	106	1446	1213	1637	389	343	0	0	630	18	0	82	55	565	147
SP72	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-2];bacterium_MOT-167	0	0	45	326	0	0	84	158	0	202	0	0	271	0	188	668	201	21	143	0	0	157	11	0	233	0	6	0	321	34	435	254	0	131	68	125	204	355	90	412	133	288	299	109	0	55	97	0	0	86	79	0	243	0	184	0	598	121	335	13	84	181	73	18	158	317	52	113	326	79	320	335	376	184	105	0	0	101	0	0	1086	806	1317	287
SP74	Bacteria;Actinobacteria;Coriobacteriia;Eggerthellales;Eggerthellaceae;Adlercreutzia;muris	63	2	180	82	0	0	263	64	10	51	8	242	74	24	99	71	168	163	436	8	75	211	274	0	218	3	750	0	72	1258	36	48	27	30	51	59	105	40	83	32	31	115	51	19	121	65	37	14	20	86	37	0	3	0	19	0	46	23	23	92	55	79	73	0	42	72	36	27	54	48	38	29	30	18	23	15	23	36	580	0	14	13	50	54
SP75	Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Sutterellaceae;Parasutterella;excrementihominis	179	84	0	0	192	315	37	0	13	0	86	15	9	176	123	149	29	49	16	16	0	55	13	0	0	4	0	0	0	0	45	11	99	0	0	30	9	10	21	1566	71	12	41	137	5	68	85	75	276	22	45	18	0	153	11	103	12	20	10	34	0	0	0	0	0	25	26	503	59	12	97	43	24	34	56	28	0	11	11	0	0	0	0	17
SP76	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Rikenellaceae;Alistipes;sp._MOT-127	23506	15711	339	8766	11252	21656	1893	9048	17904	12054	12196	25	12691	15107	5582	1107	8804	2448	2292	10287	4885	3264	313	1434	2059	13789	324	16203	1519	224	5452	546	22819	23721	3704	3698	3850	2430	4468	974	6326	827	1856	2553	454	7357	8902	1648	16731	523	4359	20399	5429	43720	10023	10340	4038	1892	446	2282	632	605	1143	14797	1322	3690	4056	2353	1577	6750	1848	2932	1018	6295	3819	14519	9384	9854	195	982	13571	19294	10046	15396
SP93	Bacteria;Firmicutes;Clostridia;Eubacteriales;Oscillospiraceae;Lawsonibacter;asaccharolyticus	0	0	65	232	0	0	129	410	0	187	0	0	289	0	240	429	247	44	124	0	0	219	0	0	748	0	10	0	310	44	105	200	0	67	59	163	172	228	167	219	152	384	669	211	0	41	56	0	0	71	91	0	329	0	131	0	226	150	198	18	114	142	82	48	189	242	63	92	244	81	155	125	173	69	90	0	0	123	0	0	448	660	1281	159
SP96	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-3];bacterium_MOT-168	0	0	118	313	0	0	35	53	0	85	0	10	142	0	68	115	211	65	402	0	0	45	14	0	151	0	30	0	57	0	39	50	0	0	74	63	36	70	35	121	77	56	60	21	25	56	53	0	0	104	185	0	216	0	66	0	53	50	41	87	130	86	40	0	20	61	5	11	43	89	15	85	123	189	88	0	0	181	50	66	35	32	829	209
SP97	Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Erysipelotrichaceae;Erysipelatoclostridium;[Clostridium] cocleatum	0	0	0	0	0	0	0	0	0	80	0	0	110	0	0	0	0	0	759	0	0	0	0	0	0	0	118	0	0	0	0	0	0	0	0	1281	0	214	477	0	0	78	0	0	0	0	0	0	0	0	0	0	0	0	28	0	0	0	0	0	0	0	0	0	0	86	54	0	0	80	0	0	0	0	0	0	0	0	0	0	0	0	0	0
SP98	Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Erysipelotrichaceae;Faecalibaculum;rodentium	0	0	200	95	0	0	1055	304	0	167	0	1135	100	36	395	9	39	3318	61	0	0	307	508	0	81	0	251	0	1289	2357	329	395	187	2682	440	109	303	10	215	161	121	128	104	210	2023	655	463	0	0	244	465	0	22	0	177	0	220	395	59	1892	474	965	2559	4838	278	436	786	184	49	210	83	32	109	261	420	0	0	449	3928	10185	7692	6177	1221	10113
SP99	Bacteria;Firmicutes;Clostridia;Eubacteriales;Peptococcaceae;Peptococcaceae_[G-1];bacterium_MOT-146	0	0	7	110	0	0	105	171	0	88	0	0	142	0	73	127	173	58	17	0	0	121	17	0	93	0	0	0	166	4	74	63	0	0	49	68	107	113	78	56	55	124	74	31	0	48	82	0	0	16	74	0	124	0	74	0	103	68	76	14	80	50	44	15	78	121	81	11	40	40	18	48	98	74	58	0	0	78	0	0	117	71	383	44
SPP11	Bacteria;Firmicutes;Bacilli;Lactobacillales;Lactobacillaceae;Ligilactobacillus;multispecies_spp11_4	4344	898	19019	748	477	215	1365	508	148	95	11	762	253	471	2601	2292	982	1013	31356	255	2342	18807	1099	812	30470	174	36955	31	189	121351	4248	7560	552	934	620	3745	2666	624	3479	9556	2509	1050	11142	1716	559	213	169	71	206	3152	471	82	280	78	315	89	319	533	331	1626	2788	2776	989	431	913	196	566	452	258	1397	2134	724	420	377	289	210	1074	943	2824	2097	979	423	2350	1527
SPP12	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides;multispecies_spp12_3	68135	74497	2525	1550	81947	56236	1997	3695	77877	1033	60742	8	1168	36449	1898	1241	676	329	369	99771	25486	763	497	94226	1426	127592	162	56491	604	60	1001	643	36020	14711	7356	1455	3901	1258	2030	1028	7170	861	2652	1926	160	1062	1295	59287	65638	943	2428	39327	878	76489	1537	66792	617	1119	736	476	3017	1708	2137	5528	1282	1483	1870	1457	962	957	1463	3055	1431	640	517	26284	61992	1879	224	18639	1617	4320	1381	3679
SPP19	Bacteria;Firmicutes;Bacilli;Lactobacillales;Lactobacillaceae;Lactobacillus;multispecies_spp19_4	4659	3308	51399	1017	4343	2935	16203	1163	1684	691	2311	21598	417	14768	65436	2433	28	4257	76330	47	94998	4047	40669	97	2189	13224	1809	760	2520	27088	14483	643	1632	123	2320	40182	629	4121	353	3823	8971	345	1649	3215	11913	603	3788	2	0	19420	2829	5	688	0	6267	0	1748	419	21873	2806	14118	43310	2153	8479	2239	2083	463	5429	6418	9015	957	1968	3104	4453	1579	868	33	242	12005	30	11004	2298	1267	20205
SPP21	Bacteria;Actinobacteria;Actinomycetia;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;multispecies_spp21_2	2400	113	4323	26	42	79	392	0	28	145	47	11141	115	825	1421	11	42	21494	6	51	0	2400	6586	11	543	61	2215	49	85	834	18	266	11	123	172	516	60	82	619	166	189	223	196	454	24569	557	1717	4	0	144	872	18	451	0	287	0	67	136	162	9156	91	66	587	93	143	540	447	2013	128	452	62	20	447	1460	1329	12	0	222	11187	443	3500	1324	120	3403
SPP23	Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;multispecies_spp23_9	0	0	0	0	0	0	0	0	0	0	0	176693	0	0	0	0	0	0	0	0	0	0	0	0	0	0	119	0	0	11	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
SPP24	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Blautia;multispecies_spp24_3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	8441	109	0	0	0	0	0	0	0	0	0	0	0	0	0	5354	98195	0	0	3684	0	13900	0	14719	0	0	0	0	0	0	0	83	0	0	0	0	0	0	0	0	0	0	0	11366	13413	0	3	1952	0	13	0	8
SPP25	Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;multigenus;multispecies_spp25_7	1843	709	12	0	1099	1645	0	19	365	0	1049	0	0	477	0	0	0	0	0	7563	4547	7	0	14885	3	3572	7	322	0	49	0	6	863	14	0	0	0	0	0	0	221	0	0	0	0	11	4	421	11398	7	3	112	0	3160	0	3338	0	0	0	0	9	21	0	7	0	0	0	0	0	0	0	4	0	0	0	306	327	0	0	6	0	4	4	12
SPP26	Bacteria;Actinobacteria;Coriobacteriia;Eggerthellales;Eggerthellaceae;Adlercreutzia;multispecies_spp26_2	0	0	91	29	0	0	82	36	0	37	0	121	29	0	52	53	105	157	312	0	0	184	134	0	195	0	300	0	20	395	0	13	0	0	0	31	47	0	21	18	0	48	33	22	117	0	18	0	0	110	31	13	21	4	0	0	8	0	9	35	24	47	0	0	17	21	0	0	0	0	17	26	22	9	8	0	26	19	162	15	61	46	81	66
SPP3	Bacteria;Firmicutes;Bacilli;Lactobacillales;Lactobacillaceae;Limosilactobacillus;multispecies_spp3_8	1560	1872	19433	534	557	408	1430	620	465	209	442	1642	106	722	4832	2634	10	665	22274	0	11512	1572	8721	11	695	132	273	0	873	9391	2235	307	288	90	646	8985	185	842	65	2493	1260	101	818	678	5047	178	907	6	12	12880	71	0	252	43	762	0	1057	184	2277	759	3440	5855	453	1176	463	446	96	578	594	3172	852	687	963	1859	334	292	49	98	1604	79	182	193	118	689
SPP8	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Enterocloster;multispecies_spp8_2	5881	56901	0	0	126008	60156	0	0	138836	0	52652	0	0	39557	0	0	0	0	0	95474	11896	0	0	30162	0	84802	0	73471	0	0	0	0	27180	0	0	0	0	0	0	0	0	0	0	0	0	0	0	89125	1140	0	0	36515	0	219	0	1803	0	0	0	0	0	0	0	33	0	0	0	0	0	0	0	0	0	0	0	4023	3782	0	0	16728	0	0	0	0

Download OTU Tables at Different Taxonomy Levels
Phylum	Count*:	Relative**:	CLR***:
Class	Count*:	Relative**:	CLR***:
Order	Count*:	Relative**:	CLR***:
Family	Count*:	Relative**:	CLR***:
Genus	Count*:	Relative**:	CLR***:
Species	Count*:	Relative**:	CLR***:
* Read count
** Relative abundance (count/total sample count)
*** Centered log ratio transformed abundance

;

The species listed in the table has full taxonomy and a dynamically assigned species ID specific to this report. When some reads match with the reference sequences of more than one species equally (i.e., same percent identiy and alignmnet coverage), they can't be assigned to a particular species. Instead, they are assigned to multiple species with the species notaton "s__multispecies_spp2_2". In this notation, spp2 is the dynamic ID assigned to these reads that hit multiple sequences and the "_2" at the end of the notation means there are two species in the spp2.

You can look up which species are included in the multi-species assignment, in this table below:

Another type of notation is "s__multispecies_sppn2_2", in which the "n" in the sppn2 means it's a potential novel species because all the reads in this species have < 98% idenity to any of the reference sequences. They were grouped together based on de novo OTU clustering at 98% identity cutoff. And then a representative sequence was chosed to BLASTN search against the reference database to find the closest match (but will still be < 98%). This representative sequence also matched equally to more than one species, hence the "spp" was given in the label.

Taxonomy Bar Plots for All Samples

Taxonomy Bar Plots for Individual Comparison Groups

Comparison No.	Comparison Name	Families		Genera		Species
Comparison 1	Control_0_F vs Control_4_F vs Control_8_F	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 2	NR_0_F vs NR_4_F vs NR_8_F	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 3	PTSD_0_F vs PTSD_4_F vs PTSD_8_F	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 4	Control_0_M vs Control_4_M vs Control_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 5	NR_0_M vs NR_4_M vs NR_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 6	PTSD_0_M vs PTSD_4_M vs PTSD_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 7	Control_0 vs Control_4 vs Control_8	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 8	NR_0 vs NR_4 vs NR_8	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 9	PTSD_0 vs PTSD_4 vs PTSD_8	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 10	Control_0_F vs Control_0_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 11	Control_4_F vs Control_4_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 12	Control_8_F vs Control_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 13	NR_0_F vs NR_0_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 14	NR_4_F vs NR_4_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 15	NR_8_F vs NR_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 16	PTSD_0_F vs PTSD_0_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 17	PTSD_4_F vs PTSD_4_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 18	PTSD_8_F vs PTSD_8_M	PDF	SVG	PDF	SVG	PDF	SVG

VIII. Analysis - Alpha Diversity

In ecology, alpha diversity (α-diversity) is the mean species diversity in sites or habitats at a local scale. The term was introduced by R. H. Whittaker[5][6] together with the terms beta diversity (β-diversity) and gamma diversity (γ-diversity). Whittaker's idea was that the total species diversity in a landscape (gamma diversity) is determined by two different things, the mean species diversity in sites or habitats at a more local scale (alpha diversity) and the differentiation among those habitats (beta diversity).

References:

Alpha Diversity Analysis by Rarefaction

Diversity measures are affected by the sampling depth. Rarefaction is a technique to assess species richness from the results of sampling. Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the number of samples. Rarefaction curves generally grow rapidly at first, as the most common species are found, but the curves plateau as only the rarest species remain to be sampled [7].

References:

Willis AD. Rarefaction, Alpha Diversity, and Statistics. Front Microbiol. 2019 Oct 23;10:2407. doi: 10.3389/fmicb.2019.02407. PMID: 31708888; PMCID: PMC6819366.

Boxplot of Alpha-diversity Indices

The two main factors taken into account when measuring diversity are richness and evenness. Richness is a measure of the number of different kinds of organisms present in a particular area. Evenness compares the similarity of the population size of each of the species present. There are many different ways to measure the richness and evenness. These measurements are called "estimators" or "indices". Below is a diversity of 3 commonly used indices showing the values for all the samples (dots) and in groups (boxes) at the species level.

Printed on each graph is the statistical significance p values of the difference between the groups. The significance is calculated using either Kruskal-Wallis test or the Wilcoxon rank sum test, both are non-parametric methods (since microbiome read count data are considered non-normally distributed) for testing whether samples originate from the same distribution (i.e., no difference between groups). The Kruskal-Wallis test is used to compare three or more independent groups to determine if there are statistically significant differences between their medians. The Wilcoxon Rank Sum test, also known as the Mann-Whitney U test, is used to compare two independent groups to determine if there is a significant difference between their distributions.
The p-value is shown on the top of each graph. A p-value < 0.05 is considered statistically significant between/among the test groups.

Alpha Diversity Box Plots for All Groups - Species Level

Alpha Diversity Box Plots for Individual Comparisons at Species level

Comparison 1	Control_0_F vs Control_4_F vs Control_8_F	View in PDF	View in SVG
Comparison 2	NR_0_F vs NR_4_F vs NR_8_F	View in PDF	View in SVG
Comparison 3	PTSD_0_F vs PTSD_4_F vs PTSD_8_F	View in PDF	View in SVG
Comparison 4	Control_0_M vs Control_4_M vs Control_8_M	View in PDF	View in SVG
Comparison 5	NR_0_M vs NR_4_M vs NR_8_M	View in PDF	View in SVG
Comparison 6	PTSD_0_M vs PTSD_4_M vs PTSD_8_M	View in PDF	View in SVG
Comparison 7	Control_0 vs Control_4 vs Control_8	View in PDF	View in SVG
Comparison 8	NR_0 vs NR_4 vs NR_8	View in PDF	View in SVG
Comparison 9	PTSD_0 vs PTSD_4 vs PTSD_8	View in PDF	View in SVG
Comparison 10	Control_0_F vs Control_0_M	View in PDF	View in SVG
Comparison 11	Control_4_F vs Control_4_M	View in PDF	View in SVG
Comparison 12	Control_8_F vs Control_8_M	View in PDF	View in SVG
Comparison 13	NR_0_F vs NR_0_M	View in PDF	View in SVG
Comparison 14	NR_4_F vs NR_4_M	View in PDF	View in SVG
Comparison 15	NR_8_F vs NR_8_M	View in PDF	View in SVG
Comparison 16	PTSD_0_F vs PTSD_0_M	View in PDF	View in SVG
Comparison 17	PTSD_4_F vs PTSD_4_M	View in PDF	View in SVG
Comparison 18	PTSD_8_F vs PTSD_8_M	View in PDF	View in SVG

The above comparisons are at the species-level. Comparisons of other taxonomy levels, from phylum to genus, are also available:

IX. Analysis - Beta Diversity

NMDS and PCoA Plots

Beta diversity compares the similarity (or dissimilarity) of microbial profiles between different groups of samples. There are many different similarity/dissimilarity metrics [8]. In general, they can be quantitative (using sequence abundance, e.g., Bray-Curtis or weighted UniFrac) or binary (considering only presence-absence of sequences, e.g., binary Jaccard or unweighted UniFrac). They can be even based on phylogeny (e.g., UniFrac metrics) or not (non-UniFrac metrics, such as Bray-Curtis, etc.).

For microbiome studies, species profiles of samples can be compared with the Bray-Curtis dissimilarity, which is based on the count data type. The pair-wise Bray-Curtis dissimilarity matrix of all samples can then be subject to either multi-dimensional scaling (MDS, also known as PCoA) or non-metric MDS (NMDS).

MDS/PCoA is a scaling or ordination method that starts with a matrix of similarities or dissimilarities between a set of samples and aims to produce a low-dimensional graphical plot of the data in such a way that distances between points in the plot are close to original dissimilarities.

NMDS is similar to MDS, however it does not use the dissimilarities data, instead it converts them into the ranks and use these ranks in the calculation.

References:

Plantinga, AM, Wu, MC (2021). Beta Diversity and Distance-Based Analysis of Microbiome Data. In: Datta, S., Guha, S. (eds) Statistical Analysis of Microbiome Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-73351-3_5

In our beta diversity analysis, Bray-Curtis dissimilarity matrix was first calculated and then plotted by the PCoA and NMDS separately. Below are beta diveristy results for all groups together, at the Species level:

NMDS and PCoA Plots for All Groups - Species Level

The above PCoA and NMDS plots are based on count data. The count data can also be transformed into centered log ratio (CLR) for each species. The CLR data is no longer count data and cannot be used in Bray-Curtis dissimilarity calculation. Instead CLR can be compared with Euclidean distances. When CLR data are compared by Euclidean distance, the distance is also called Aitchison distance.

Below are the NMDS and PCoA plots of the Aitchison distances of the samples at the Species level:

NMDS and PCoA Plots for Individual Comparisons at Species level

Comparison No.	Comparison Name	NMDA				PCoA
Comparison No.	Comparison Name	Bray-Curtis		CLR Euclidean		Bray-Curtis		CLR Euclidean
Comparison 1	Control_0_F vs Control_4_F vs Control_8_F	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 2	NR_0_F vs NR_4_F vs NR_8_F	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 3	PTSD_0_F vs PTSD_4_F vs PTSD_8_F	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 4	Control_0_M vs Control_4_M vs Control_8_M	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 5	NR_0_M vs NR_4_M vs NR_8_M	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 6	PTSD_0_M vs PTSD_4_M vs PTSD_8_M	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 7	Control_0 vs Control_4 vs Control_8	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 8	NR_0 vs NR_4 vs NR_8	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 9	PTSD_0 vs PTSD_4 vs PTSD_8	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 10	Control_0_F vs Control_0_M	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 11	Control_4_F vs Control_4_M	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 12	Control_8_F vs Control_8_M	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 13	NR_0_F vs NR_0_M	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 14	NR_4_F vs NR_4_M	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 15	NR_8_F vs NR_8_M	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 16	PTSD_0_F vs PTSD_0_M	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 17	PTSD_4_F vs PTSD_4_M	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 18	PTSD_8_F vs PTSD_8_M	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG

Interactive 3D PCoA Plots - Bray-Curtis Dissimilarity

Interactive 3D PCoA Plots - Euclidean Distance

Interactive 3D PCoA Plots - Correlation Coefficients

X. Analysis - Differential Abundance

16S rRNA next generation sequencing (NGS) generates a fixed number of reads that reflect the proportion of different species in a sample, i.e., the relative abundance of species, instead of the absolute abundance. In Mathematics, measurements involving probabilities, proportions, percentages, and ppm can all be thought of as compositional data. This makes the microbiome read count data “compositional” (Gloor et al, 2017). In general, compositional data represent parts of a whole which only carry relative information [9].

The problem of microbiome data being compositional arises when comparing two groups of samples for identifying “differentially abundant” species. A species with the same absolute abundance between two conditions, its relative abundances in the two conditions (e.g., percent abundance) can become different if the relative abundance of other species change greatly. This problem can lead to incorrect conclusion in terms of differential abundance for microbial species in the samples.

When studying differential abundance (DA), the current better approach is to transform the read count data into log ratio data. The ratios are calculated between read counts of all species in a sample to a “reference” count (e.g., mean read count of the sample). The log ratio data allow the detection of DA species without being affected by percentage bias mentioned above

In this report, a compositional DA analysis tool “ANCOM” (analysis of composition of microbiomes) was used [10]. ANCOM transforms the count data into log-ratios and thus is more suitable for comparing the composition of microbiomes in two or more populations. "ANCOM" generates a table of features with W-statistics and whether the null hypothesis is rejected. The “W” is the W-statistic, or number of features that a single feature is tested to be significantly different against. Hence the higher the "W" the more statistical sifgnificant that a feature/species is differentially abundant.

References:

Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol. 2017 Nov 15;8:2224. doi: 10.3389/fmicb.2017.02224. PMID: 29187837; PMCID: PMC5695134.
Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis. 2015 May 29;26:27663. doi: 10.3402/mehd.v26.27663. PMID: 26028277; PMCID: PMC4450248.

ANCOM Differential Abundance Analysis

ANCOM Results for Individual Comparisons

Comparison No.	Comparison Name
Comparison 1.	Control_0_F vs Control_4_F vs Control_8_F
Comparison 2.	NR_0_F vs NR_4_F vs NR_8_F
Comparison 3.	PTSD_0_F vs PTSD_4_F vs PTSD_8_F
Comparison 4.	Control_0_M vs Control_4_M vs Control_8_M
Comparison 5.	NR_0_M vs NR_4_M vs NR_8_M
Comparison 6.	PTSD_0_M vs PTSD_4_M vs PTSD_8_M
Comparison 7.	Control_0 vs Control_4 vs Control_8
Comparison 8.	NR_0 vs NR_4 vs NR_8
Comparison 9.	PTSD_0 vs PTSD_4 vs PTSD_8
Comparison 10.	Control_0_F vs Control_0_M
Comparison 11.	Control_4_F vs Control_4_M
Comparison 12.	Control_8_F vs Control_8_M
Comparison 13.	NR_0_F vs NR_0_M
Comparison 14.	NR_4_F vs NR_4_M
Comparison 15.	NR_8_F vs NR_8_M
Comparison 16.	PTSD_0_F vs PTSD_0_M
Comparison 17.	PTSD_4_F vs PTSD_4_M
Comparison 18.	PTSD_8_F vs PTSD_8_M

ANCOM-BC2 Differential Abundance Analysis

Starting with version V1.2, we include the results of ANCOM-BC (Analysis of Compositions of Microbiomes with Bias Correction) (Lin and Peddada 2020) [11]. ANCOM-BC is an updated version of "ANCOM" that:
(a) provides statistically valid test with appropriate p-values,
(b) provides confidence intervals for differential abundance of each taxon,
(c) controls the False Discovery Rate (FDR),
(d) maintains adequate power, and
(e) is computationally simple to implement.

The bias correction (BC) addresses a challenging problem of the bias introduced by differences in the sampling fractions across samples. This bias has been a major hurdle in performing DA analysis of microbiome data. ANCOM-BC estimates the unknown sampling fractions and corrects the bias induced by their differences among samples. The absolute abundance data are modeled using a linear regression framework.

Starting with version V1.43, ANCOM-BC2 is used instead of ANCOM-BC, So that multiple pairwise directional test can be performed (if there are more than two gorups in a comparison). When performing pairwise directional test, the mixed directional false discover rate (mdFDR) is taken into account. The mdFDR is the combination of false discovery rate due to multiple testing, multiple pairwise comparisons, and directional tests within each pairwise comparison. The mdFDR is adopted from (Guo, Sarkar, and Peddada 2010 [12]; Grandhi, Guo, and Peddada 2016 [13]). For more detail explanation and additional features of ANCOM-BC2 please see author's documentation.

References:

Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nat Commun. 2020 Jul 14;11(1):3514. doi: 10.1038/s41467-020-17041-7. PMID: 32665548; PMCID: PMC7360769.
Guo W, Sarkar SK, Peddada SD. Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics. 2010 Jun;66(2):485-92. doi: 10.1111/j.1541-0420.2009.01292.x. Epub 2009 Jul 23. PMID: 19645703; PMCID: PMC2895927.
Grandhi A, Guo W, Peddada SD. A multiple testing procedure for multi-dimensional pairwise comparisons with application to gene expression studies. BMC Bioinformatics. 2016 Feb 25;17:104. doi: 10.1186/s12859-016-0937-5. PMID: 26917217; PMCID: PMC4768411.

ANCOM-BC Results for Individual Comparisons

Comparison No.	Comparison Name
Comparison 1.	Control_0_F vs Control_4_F vs Control_8_F
Comparison 2.	NR_0_F vs NR_4_F vs NR_8_F
Comparison 3.	PTSD_0_F vs PTSD_4_F vs PTSD_8_F
Comparison 4.	Control_0_M vs Control_4_M vs Control_8_M
Comparison 5.	NR_0_M vs NR_4_M vs NR_8_M
Comparison 6.	PTSD_0_M vs PTSD_4_M vs PTSD_8_M
Comparison 7.	Control_0 vs Control_4 vs Control_8
Comparison 8.	NR_0 vs NR_4 vs NR_8
Comparison 9.	PTSD_0 vs PTSD_4 vs PTSD_8
Comparison 10.	Control_0_F vs Control_0_M
Comparison 11.	Control_4_F vs Control_4_M
Comparison 12.	Control_8_F vs Control_8_M
Comparison 13.	NR_0_F vs NR_0_M
Comparison 14.	NR_4_F vs NR_4_M
Comparison 15.	NR_8_F vs NR_8_M
Comparison 16.	PTSD_0_F vs PTSD_0_M
Comparison 17.	PTSD_4_F vs PTSD_4_M
Comparison 18.	PTSD_8_F vs PTSD_8_M

LEfSe - Linear Discriminant Analysis Effect Size

LEfSe (Linear Discriminant Analysis Effect Size) is an alternative method to find "organisms, genes, or pathways that consistently explain the differences between two or more microbial communities" (Segata et al., 2011) [14]. Specifically, LEfSe uses rank-based Kruskal-Wallis (KW) sum-rank test to detect features with significant differential (relative) abundance with respect to the class of interest. Since it is rank-based, instead of proportional based, the differential species identified among the comparison groups is less biased (than percent abundance based).

Reference:

Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. Metagenomic biomarker discovery and explanation. Genome Biol. 2011 Jun 24;12(6):R60. doi: 10.1186/gb-2011-12-6-r60. PMID: 21702898; PMCID: PMC3218848.

Control_0_F vs Control_4_F vs Control_8_F

LEfSe Results for All Comparisons

Comparison No.	Comparison Name
Comparison 1.	Control_0_F vs Control_4_F vs Control_8_F
Comparison 2.	NR_0_F vs NR_4_F vs NR_8_F
Comparison 3.	PTSD_0_F vs PTSD_4_F vs PTSD_8_F
Comparison 4.	Control_0_M vs Control_4_M vs Control_8_M
Comparison 5.	NR_0_M vs NR_4_M vs NR_8_M
Comparison 6.	PTSD_0_M vs PTSD_4_M vs PTSD_8_M
Comparison 7.	Control_0 vs Control_4 vs Control_8
Comparison 8.	NR_0 vs NR_4 vs NR_8
Comparison 9.	PTSD_0 vs PTSD_4 vs PTSD_8
Comparison 10.	Control_0_F vs Control_0_M
Comparison 11.	Control_4_F vs Control_4_M
Comparison 12.	Control_8_F vs Control_8_M
Comparison 13.	NR_0_F vs NR_0_M
Comparison 14.	NR_4_F vs NR_4_M
Comparison 15.	NR_8_F vs NR_8_M
Comparison 16.	PTSD_0_F vs PTSD_0_M
Comparison 17.	PTSD_4_F vs PTSD_4_M
Comparison 18.	PTSD_8_F vs PTSD_8_M

XI. Analysis - Heatmap Profile

Species vs Sample Abundance Heatmap for All Samples

Heatmaps for Individual Comparisons

A) Two-way clustering - clustered on both columns (Samples) and rows (organism)

Comparison No.	Comparison Name	Family Level		Genus Level		Species Level
Comparison 1	Control_0_F vs Control_4_F vs Control_8_F	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 2	NR_0_F vs NR_4_F vs NR_8_F	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 3	PTSD_0_F vs PTSD_4_F vs PTSD_8_F	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 4	Control_0_M vs Control_4_M vs Control_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 5	NR_0_M vs NR_4_M vs NR_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 6	PTSD_0_M vs PTSD_4_M vs PTSD_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 7	Control_0 vs Control_4 vs Control_8	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 8	NR_0 vs NR_4 vs NR_8	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 9	PTSD_0 vs PTSD_4 vs PTSD_8	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 10	Control_0_F vs Control_0_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 11	Control_4_F vs Control_4_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 12	Control_8_F vs Control_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 13	NR_0_F vs NR_0_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 14	NR_4_F vs NR_4_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 15	NR_8_F vs NR_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 16	PTSD_0_F vs PTSD_0_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 17	PTSD_4_F vs PTSD_4_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 18	PTSD_8_F vs PTSD_8_M	PDF	SVG	PDF	SVG	PDF	SVG

B) One-way clustering - clustered on rows (organism) only

Comparison No.	Comparison Name	Family Level		Genus Level		Species Level
Comparison 1	Control_0_F vs Control_4_F vs Control_8_F	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 2	NR_0_F vs NR_4_F vs NR_8_F	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 3	PTSD_0_F vs PTSD_4_F vs PTSD_8_F	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 4	Control_0_M vs Control_4_M vs Control_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 5	NR_0_M vs NR_4_M vs NR_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 6	PTSD_0_M vs PTSD_4_M vs PTSD_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 7	Control_0 vs Control_4 vs Control_8	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 8	NR_0 vs NR_4 vs NR_8	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 9	PTSD_0 vs PTSD_4 vs PTSD_8	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 10	Control_0_F vs Control_0_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 11	Control_4_F vs Control_4_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 12	Control_8_F vs Control_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 13	NR_0_F vs NR_0_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 14	NR_4_F vs NR_4_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 15	NR_8_F vs NR_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 16	PTSD_0_F vs PTSD_0_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 17	PTSD_4_F vs PTSD_4_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 18	PTSD_8_F vs PTSD_8_M	PDF	SVG	PDF	SVG	PDF	SVG

C) No clustering

Comparison No.	Comparison Name	Family Level		Genus Level		Species Level
Comparison 1	Control_0_F vs Control_4_F vs Control_8_F	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 2	NR_0_F vs NR_4_F vs NR_8_F	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 3	PTSD_0_F vs PTSD_4_F vs PTSD_8_F	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 4	Control_0_M vs Control_4_M vs Control_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 5	NR_0_M vs NR_4_M vs NR_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 6	PTSD_0_M vs PTSD_4_M vs PTSD_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 7	Control_0 vs Control_4 vs Control_8	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 8	NR_0 vs NR_4 vs NR_8	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 9	PTSD_0 vs PTSD_4 vs PTSD_8	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 10	Control_0_F vs Control_0_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 11	Control_4_F vs Control_4_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 12	Control_8_F vs Control_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 13	NR_0_F vs NR_0_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 14	NR_4_F vs NR_4_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 15	NR_8_F vs NR_8_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 16	PTSD_0_F vs PTSD_0_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 17	PTSD_4_F vs PTSD_4_M	PDF	SVG	PDF	SVG	PDF	SVG
Comparison 18	PTSD_8_F vs PTSD_8_M	PDF	SVG	PDF	SVG	PDF	SVG

XII. Analysis - Network Association

To analyze the co-occurrence or co-exclusion between microbial species among different samples, network correlation analysis tools are usually used for this purpose. However, microbiome count data are compositional. If count data are normalized to the total number of counts in the sample, the data become not independent and traditional statistical metrics (e.g., correlation) for the detection of specie-species relationships can lead to spurious results. In addition, sequencing-based studies typically measure hundreds of OTUs (species) on few samples; thus, inference of OTU-OTU association networks is severely under-powered. We provide the network association result with SparCC (Sparse Correlations for Compositional data)(Friedman & Alm 2012), which is a method for inferring correlations from compositional data. SparCC estimates the linear Pearson correlations between the log-transformed components.

References:

Friedman J, Alm EJ. Inferring correlation networks from genomic survey data. PLoS Comput Biol. 2012;8(9):e1002687. doi: 10.1371/journal.pcbi.1002687. Epub 2012 Sep 20. PMID: 23028285; PMCID: PMC3447976.

Association Network Inference by SparCC

XIII. Disclaimer

The results of this analysis are for research purpose only. They are not intended to diagnose, treat, cure, or prevent any disease. Forsyth and FOMC are not responsible for use of information provided in this report outside the research area.