FOMC Service Report

16S rRNA Gene V1V3 Amplicon Sequencing

Version V1.43

Version History

The Forsyth Institute, Cambridge, MA, USA
June 09, 2023

Project ID: FOMC9928_1


I. Project Summary

Project FOMC9928_1 services include NGS sequencing of the V1V3 region of the 16S rRNA gene amplicons from the samples. First and foremost, please download this report, as well as the sequence raw data from the download links provided below. These links will expire after 60 days. We cannot guarantee the availability of your data after 60 days.

Full Bioinformatics analysis service was requested. We provide many analyses, starting from the raw sequence quality and noise filtering, pair reads merging, as well as chimera filtering for the sequences, using the DADA2 denosing algorithm and pipeline.

We also provide many downstream analyses such as taxonomy assignment, alpha and beta diversity analyses, and differential abundance analysis.

For taxonomy assignment, most informative would be the taxonomy barplots. We provide an interactive barplots to show the relative abundance of microbes at different taxonomy levels (from Phylum to species) that you can choose.

If you specify which groups of samples you want to compare for differential abundance, we provide both ANCOM and LEfSe differential abundance analysis.

 

II. Workflow Checklist

1.Sample Received
2.Sample Quality Evaluated
3.Sample Prepared for Sequencing
4.Next-Gen Sequencing
5.Sequence Quality Check
6.Absolute Abundance
7.Report and Raw Sequence Data Available for Download
8.Bioinformatics Analysis - Reads Processing (DADA2 Quality Trimming, Denoising, Paired Reads Merging)
9.Bioinformatics Analysis - Reads Taxonomy Assignment
10.Bioinformatics Analysis - Alpha Diversity Analysis
11.Bioinformatics Analysis - Beta Diversity Analysis
12.Bioinformatics Analysis - Differential Abundance Analysis
13.Bioinformatics Analysis - Heatmap Profile
14.Bioinformatics Analysis - Network Association
 

III. NGS Sequencing

The samples were processed and analyzed with the ZymoBIOMICS® Service: Targeted Metagenomic Sequencing (Zymo Research, Irvine, CA).

DNA Extraction: If DNA extraction was performed, one of three different DNA extraction kits was used depending on the sample type and sample volume and were used according to the manufacturer’s instructions, unless otherwise stated. The kit used in this project is marked below:

ZymoBIOMICS® DNA Miniprep Kit (Zymo Research, Irvine, CA)
ZymoBIOMICS® DNA Microprep Kit (Zymo Research, Irvine, CA)
ZymoBIOMICS®-96 MagBead DNA Kit (Zymo Research, Irvine, CA)
N/A (DNA Extraction Not Performed)
Elution Volume: 50µL
Additional Notes: NA

Targeted Library Preparation: The DNA samples were prepared for targeted sequencing with the Quick-16S™ NGS Library Prep Kit (Zymo Research, Irvine, CA). These primers were custom designed by Zymo Research to provide the best coverage of the 16S gene while maintaining high sensitivity. The primer sets used in this project are marked below:

Quick-16S™ Primer Set V1-V2 (Zymo Research, Irvine, CA)
Quick-16S™ Primer Set V1-V3 (Zymo Research, Irvine, CA)
Quick-16S™ Primer Set V3-V4 (Zymo Research, Irvine, CA)
Quick-16S™ Primer Set V4 (Zymo Research, Irvine, CA)
Quick-16S™ Primer Set V6-V8 (Zymo Research, Irvine, CA)
Other: NA
Additional Notes: NA

The sequencing library was prepared using an innovative library preparation process in which PCR reactions were performed in real-time PCR machines to control cycles and therefore limit PCR chimera formation. The final PCR products were quantified with qPCR fluorescence readings and pooled together based on equal molarity. The final pooled library was cleaned up with the Select-a-Size DNA Clean & Concentrator™ (Zymo Research, Irvine, CA), then quantified with TapeStation® (Agilent Technologies, Santa Clara, CA) and Qubit® (Thermo Fisher Scientific, Waltham, WA).

Control Samples: The ZymoBIOMICS® Microbial Community Standard (Zymo Research, Irvine, CA) was used as a positive control for each DNA extraction, if performed. The ZymoBIOMICS® Microbial Community DNA Standard (Zymo Research, Irvine, CA) was used as a positive control for each targeted library preparation. Negative controls (i.e. blank extraction control, blank library preparation control) were included to assess the level of bioburden carried by the wet-lab process.

Sequencing: The final library was sequenced on Illumina® MiSeq™ with a V3 reagent kit (600 cycles). The sequencing was performed with 10% PhiX spike-in.

Absolute Abundance Quantification*: A quantitative real-time PCR was set up with a standard curve. The standard curve was made with plasmid DNA containing one copy of the 16S gene and one copy of the fungal ITS2 region prepared in 10-fold serial dilutions. The primers used were the same as those used in Targeted Library Preparation. The equation generated by the plasmid DNA standard curve was used to calculate the number of gene copies in the reaction for each sample. The PCR input volume (2 µl) was used to calculate the number of gene copies per microliter in each DNA sample.
The number of genome copies per microliter DNA sample was calculated by dividing the gene copy number by an assumed number of gene copies per genome. The value used for 16S copies per genome is 4. The value used for ITS copies per genome is 200. The amount of DNA per microliter DNA sample was calculated using an assumed genome size of 4.64 x 106 bp, the genome size of Escherichia coli, for 16S samples, or an assumed genome size of 1.20 x 107 bp, the genome size of Saccharomyces cerevisiae, for ITS samples. This calculation is shown below:

Calculated Total DNA = Calculated Total Genome Copies × Assumed Genome Size (4.64 × 106 bp) ×
Average Molecular Weight of a DNA bp (660 g/mole/bp) ÷ Avogadro’s Number (6.022 x 1023/mole)


* Absolute Abundance Quantification is only available for 16S and ITS analyses.

The absolute abundance standard curve data can be viewed in Excel here:

The absolute abundance standard curve is shown below:

Absolute Abundance Standard Curve

 

IV. Complete Report Download

The complete report of your project, including all links in this report, can be downloaded by clicking the link provided below. The downloaded file is a compressed ZIP file and once unzipped, open the file “REPORT.html” (may only shown as "REPORT" in your computer) by double clicking it. Your default web browser will open it and you will see the exact content of this report.

Please download and save the file to your computer storage device. The download link will expire after 60 days upon your receiving of this report.

Complete report download link:

To view the report, please follow the following steps:
1.Download the .zip file from the report link above.
2.Extract all the contents of the downloaded .zip file to your desktop.
3.Open the extracted folder and find the "REPORT.html" (may shown as only "REPORT").
4.Open (double-clicking) the REPORT.html file. Your default browser will open the top age of the complete report. Within the report, there are links to view all the analyses performed for the project.

 

V. Raw Sequence Data Download

The raw NGS sequence data is available for download with the link provided below. The data is a compressed ZIP file and can be unzipped to individual sequence files. Since this is a pair-end sequencing, each of your samples is represented by two sequence files, one for READ 1, with the file extension “*_R1.fastq.gz”, another READ 2, with the file extension “*_R1.fastq.gz”. The files are in FASTQ format and are compressed. FASTQ format is a text-based data format for storing both a biological sequence and its corresponding quality scores. Most sequence analysis software will be able to open them. The Sample IDs associated with the R1 and R2 fastq files are listed in the table below:

Sample IDOriginal Sample IDRead 1 File NameRead 2 File Name
F9929.S10original sample ID herezr9929_10V1V3_R1.fastq.gzzr9929_10V1V3_R2.fastq.gz
F9929.S11original sample ID herezr9929_11V1V3_R1.fastq.gzzr9929_11V1V3_R2.fastq.gz
F9929.S12original sample ID herezr9929_12V1V3_R1.fastq.gzzr9929_12V1V3_R2.fastq.gz
F9929.S13original sample ID herezr9929_13V1V3_R1.fastq.gzzr9929_13V1V3_R2.fastq.gz
F9929.S14original sample ID herezr9929_14V1V3_R1.fastq.gzzr9929_14V1V3_R2.fastq.gz
F9929.S15original sample ID herezr9929_15V1V3_R1.fastq.gzzr9929_15V1V3_R2.fastq.gz
F9929.S16original sample ID herezr9929_16V1V3_R1.fastq.gzzr9929_16V1V3_R2.fastq.gz
F9929.S17original sample ID herezr9929_17V1V3_R1.fastq.gzzr9929_17V1V3_R2.fastq.gz
F9929.S18original sample ID herezr9929_18V1V3_R1.fastq.gzzr9929_18V1V3_R2.fastq.gz
F9929.S19original sample ID herezr9929_19V1V3_R1.fastq.gzzr9929_19V1V3_R2.fastq.gz
F9929.S01original sample ID herezr9929_1V1V3_R1.fastq.gzzr9929_1V1V3_R2.fastq.gz
F9929.S20original sample ID herezr9929_20V1V3_R1.fastq.gzzr9929_20V1V3_R2.fastq.gz
F9929.S21original sample ID herezr9929_21V1V3_R1.fastq.gzzr9929_21V1V3_R2.fastq.gz
F9929.S22original sample ID herezr9929_22V1V3_R1.fastq.gzzr9929_22V1V3_R2.fastq.gz
F9929.S23original sample ID herezr9929_23V1V3_R1.fastq.gzzr9929_23V1V3_R2.fastq.gz
F9929.S24original sample ID herezr9929_24V1V3_R1.fastq.gzzr9929_24V1V3_R2.fastq.gz
F9929.S25original sample ID herezr9929_25V1V3_R1.fastq.gzzr9929_25V1V3_R2.fastq.gz
F9929.S26original sample ID herezr9929_26V1V3_R1.fastq.gzzr9929_26V1V3_R2.fastq.gz
F9929.S27original sample ID herezr9929_27V1V3_R1.fastq.gzzr9929_27V1V3_R2.fastq.gz
F9929.S28original sample ID herezr9929_28V1V3_R1.fastq.gzzr9929_28V1V3_R2.fastq.gz
F9929.S29original sample ID herezr9929_29V1V3_R1.fastq.gzzr9929_29V1V3_R2.fastq.gz
F9929.S02original sample ID herezr9929_2V1V3_R1.fastq.gzzr9929_2V1V3_R2.fastq.gz
F9929.S30original sample ID herezr9929_30V1V3_R1.fastq.gzzr9929_30V1V3_R2.fastq.gz
F9929.S31original sample ID herezr9929_31V1V3_R1.fastq.gzzr9929_31V1V3_R2.fastq.gz
F9929.S32original sample ID herezr9929_32V1V3_R1.fastq.gzzr9929_32V1V3_R2.fastq.gz
F9929.S33original sample ID herezr9929_33V1V3_R1.fastq.gzzr9929_33V1V3_R2.fastq.gz
F9929.S34original sample ID herezr9929_34V1V3_R1.fastq.gzzr9929_34V1V3_R2.fastq.gz
F9929.S35original sample ID herezr9929_35V1V3_R1.fastq.gzzr9929_35V1V3_R2.fastq.gz
F9929.S03original sample ID herezr9929_3V1V3_R1.fastq.gzzr9929_3V1V3_R2.fastq.gz
F9929.S04original sample ID herezr9929_4V1V3_R1.fastq.gzzr9929_4V1V3_R2.fastq.gz
F9929.S05original sample ID herezr9929_5V1V3_R1.fastq.gzzr9929_5V1V3_R2.fastq.gz
F9929.S06original sample ID herezr9929_6V1V3_R1.fastq.gzzr9929_6V1V3_R2.fastq.gz
F9929.S07original sample ID herezr9929_7V1V3_R1.fastq.gzzr9929_7V1V3_R2.fastq.gz
F9929.S08original sample ID herezr9929_8V1V3_R1.fastq.gzzr9929_8V1V3_R2.fastq.gz
F9929.S09original sample ID herezr9929_9V1V3_R1.fastq.gzzr9929_9V1V3_R2.fastq.gz

Please download and save the file to your computer storage device. The download link will expire after 60 days upon your receiving of this report.

Raw sequence data download link:

 

VI. Analysis - DADA2 Read Processing

What is DADA2?

DADA2 is a software package that models and corrects Illumina-sequenced amplicon errors. DADA2 infers sample sequences exactly, without coarse-graining into OTUs, and resolves differences of as little as one nucleotide. DADA2 identified more real variants and output fewer spurious sequences than other methods.

DADA2’s advantage is that it uses more of the data. The DADA2 error model incorporates quality information, which is ignored by all other methods after filtering. The DADA2 error model incorporates quantitative abundances, whereas most other methods use abundance ranks if they use abundance at all. The DADA2 error model identifies the differences between sequences, eg. A->C, whereas other methods merely count the mismatches. DADA2 can parameterize its error model from the data itself, rather than relying on previous datasets that may or may not reflect the PCR and sequencing protocols used in your study.

DADA2 Publication: Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods. 2016 Jul;13(7):581-3. doi: 10.1038/nmeth.3869. Epub 2016 May 23. PMID: 27214047; PMCID: PMC4927377.

DADA2 Software Package is available as an R package at : https://benjjneb.github.io/dada2/index.html

Analysis Procedures:

DADA2 pipeline includes several tools for read quality control, including quality filtering, trimming, denoising, pair merging and chimera filtering. Below are the major processing steps of DADA2:

Step 1. Read trimming based on sequence quality The quality of NGS Illumina sequences often decreases toward the end of the reads. DADA2 allows to trim off the poor quality read ends in order to improve the error model building and pair mergicing performance.

Step 2. Learn the Error Rates The DADA2 algorithm makes use of a parametric error model (err) and every amplicon dataset has a different set of error rates. The learnErrors method learns this error model from the data, by alternating estimation of the error rates and inference of sample composition until they converge on a jointly consistent solution. As in many machine-learning problems, the algorithm must begin with an initial guess, for which the maximum possible error rates in this data are used (the error rates if only the most abundant sequence is correct and all the rest are errors).

Step 3. Infer amplicon sequence variants (ASVs) based on the error model built in previous step. This step is also called sequence "denoising". The outcome of this step is a list of ASVs that are the equivalent of oligonucleotides.

Step 4. Merge paired reads. If the sequencing products are read pairs, DADA2 will merge the R1 and R2 ASVs into single sequences. Merging is performed by aligning the denoised forward reads with the reverse-complement of the corresponding denoised reverse reads, and then constructing the merged “contig” sequences. By default, merged sequences are only output if the forward and reverse reads overlap by at least 12 bases, and are identical to each other in the overlap region (but these conditions can be changed via function arguments).

Step 5. Remove chimera. The core dada method corrects substitution and indel errors, but chimeras remain. Fortunately, the accuracy of sequence variants after denoising makes identifying chimeric ASVs simpler than when dealing with fuzzy OTUs. Chimeric sequences are identified if they can be exactly reconstructed by combining a left-segment and a right-segment from two more abundant “parent” sequences. The frequency of chimeric sequences varies substantially from dataset to dataset, and depends on on factors including experimental procedures and sample complexity.

Results

1. Read Quality Plots NGS sequence analaysis starts with visualizing the quality of the sequencing. Below are the quality plots of the first sample for the R1 and R2 reads separately. In gray-scale is a heat map of the frequency of each quality score at each base position. The mean quality score at each position is shown by the green line, and the quartiles of the quality score distribution by the orange lines. The forward reads are usually of better quality. It is a common practice to trim the last few nucleotides to avoid less well-controlled errors that can arise there. The trimming affects the downstream steps including error model building, merging and chimera calling. FOMC uses an empirical approach to test many combinations of different trim length in order to achieve best final amplicon sequence variants (ASVs), see the next section “Optimal trim length for ASVs”.

Quality plots for all samples:

2. Optimal trim length for ASVs The final number of merged and chimera-filtered ASVs depends on the quality filtering (hence trimming) in the very beginning of the DADA2 pipeline. In order to achieve highest number of ASVs, an empirical approach was used -

  1. Create a random subset of each sample consisting of 5,000 R1 and 5,000 R2 (to reduce computation time)
  2. Trim 10 bases at a time from the ends of both R1 and R2 up to 50 bases
  3. For each combination of trimmed length (e.g., 300x300, 300x290, 290x290 etc), the trimmed reads are subject to the entire DADA2 pipeline for chimera-filtered merged ASVs
  4. The combination with highest percentage of the input reads becoming final ASVs is selected for the complete set of data

Below is the result of such operation, showing ASV percentages of total reads for all trimming combinations (1st Column = R1 lengths in bases; 1st Row = R2 lengths in bases):

R1/R2281271261251241231
32132.97%32.93%32.95%32.84%32.79%20.40%
31132.77%32.91%32.83%32.51%20.68%11.13%
30132.22%32.17%32.02%20.01%10.89%9.23%
29146.81%46.70%34.48%25.63%23.99%23.80%
28150.26%38.13%29.17%27.75%27.50%26.62%
27149.31%40.83%39.58%39.50%38.42%38.79%

Based on the above result, the trim length combination of R1 = 281 bases and R2 = 281 bases (highlighted red above), was chosen for generating final ASVs for all sequences. This combination generated highest number of merged non-chimeric ASVs and was used for downstream analyses, if requested.

3. Error plots from learning the error rates After DADA2 building the error model for the set of data, it is always worthwhile, as a sanity check if nothing else, to visualize the estimated error rates. The error rates for each possible transition (A→C, A→G, …) are shown below. Points are the observed error rates for each consensus quality score. The black line shows the estimated error rates after convergence of the machine-learning algorithm. The red line shows the error rates expected under the nominal definition of the Q-score. The ideal result would be the estimated error rates (black line) are a good fit to the observed rates (points), and the error rates drop with increased quality as expected.

Forward Read R1 Error Plot


Reverse Read R2 Error Plot

The PDF version of these plots are available here:

 

4. DADA2 Result Summary The table below shows the summary of the DADA2 analysis, tracking paired read counts of each samples for all the steps during DADA2 denoising process - including end-trimming (filtered), denoising (denoisedF, denoisedF), pair merging (merged) and chimera removal (nonchim).

Sample IDF9929.S01F9929.S02F9929.S03F9929.S04F9929.S05F9929.S06F9929.S07F9929.S08F9929.S09F9929.S10F9929.S11F9929.S12F9929.S13F9929.S14F9929.S15F9929.S16F9929.S17F9929.S18F9929.S19F9929.S20F9929.S21F9929.S22F9929.S23F9929.S24F9929.S25F9929.S26F9929.S27F9929.S28F9929.S29F9929.S30F9929.S31F9929.S32F9929.S33F9929.S34F9929.S35Row SumPercentage
input103,597116,740117,170136,489106,929121,275129,738105,930132,960128,717129,251121,208195,226228,681259,737235,556243,008236,306189,558245,038167,027239,237250,362264,183253,800249,152209,026211,428277,096237,468232,651211,750239,236219,307228,5016,773,338100.00%
filtered100,439113,376113,610132,463103,555117,564125,872102,828128,984124,744125,429117,534188,994221,737251,771228,036235,654229,157183,561237,245161,816231,996242,651255,746246,098241,527202,350204,939268,477229,916225,379205,154231,610212,298221,4786,563,98896.91%
denoisedF99,963113,024113,098131,937102,986116,925125,535102,242128,277124,397125,070117,207186,577220,469250,103226,981227,326228,357182,380233,319160,978231,558239,370254,964244,956240,628199,871201,483267,622229,285222,120204,497229,385208,431220,7236,512,04496.14%
denoisedR98,706111,786111,533130,322101,686115,326123,960100,771127,000122,567123,228115,311184,480217,399246,753223,141231,122224,366179,903233,168158,343227,399238,675250,948241,066236,657198,425200,927263,209225,664221,418201,233226,605208,859216,9996,438,95595.06%
merged94,182108,810109,089126,51497,217111,736122,08897,149123,659119,481120,388112,672173,354208,728235,090213,546212,393214,580173,305218,280150,775214,118223,412235,594232,010226,523188,353188,495251,261213,382205,430192,204215,698190,420207,2506,127,18690.46%
nonchim69,20676,86481,77992,20563,86768,91045,09448,44287,89691,20093,17082,424101,304173,893169,416173,465112,199156,859131,286143,60993,560183,957136,494188,397155,274185,260155,16596,999198,782182,460122,128142,423154,444118,516131,7434,308,69063.61%

This table can be downloaded as an Excel table below:

 

5. DADA2 Amplicon Sequence Variants (ASVs). A total of 6905 unique merged and chimera-free ASV sequences were identified, and their corresponding read counts for each sample are available in the "ASV Read Count Table" with rows for the ASV sequences and columns for sample. This read count table can be used for microbial profile comparison among different samples and the sequences provided in the table can be used to taxonomy assignment.

 

The table can be downloaded from this link:

 
 

Sample Meta Information

Download Sample Meta Information
#SampleIDSample_NameGroupGroup1Group2Group3Group4SourceTerm_AntibioticsDay_of_Life
F9928.S01151MPT Mother ATB Baby No ATBSaliva MotherSaliva MotherSaliva Preterm Mother, B(-)Group I - Preterm Mother, B (-)NANANA
F9928.S02155MPT Mother ATB Baby No ATBSaliva MotherSaliva MotherSaliva Preterm Mother, B(-)Group I - Preterm Mother, B (-)NANANA
F9928.S03161MPT Mother ATB Baby No ATBSaliva MotherSaliva MotherSaliva Preterm Mother, B(-)Group I - Preterm Mother, B (-)NANANA
F9928.S04171MPT Mother ATB Baby ATBSaliva MotherSaliva MotherSaliva Preterm Mother, B(+)Group II -Preterm Mother, B(+)NANANA
F9928.S05177MPT Mother ATB Baby ATBSaliva MotherSaliva MotherSaliva Preterm Mother, B(+)Group II -Preterm Mother, B(+)NANANA
F9928.S06182MPT Mother ATB Baby ATBSaliva MotherSaliva MotherSaliva Preterm Mother, B(+)Group II -Preterm Mother, B(+)NANANA
F9928.S079MOT Mother ATB Baby No ATBSaliva MotherSaliva MotherSaliva On-term Mother, B(-)Group III -On-term Mother, B(-)NANANA
F9928.S0813MOT Mother ATB Baby No ATBSaliva MotherSaliva MotherSaliva On-term Mother, B(-)Group III -On-term Mother, B(-)NANANA
F9928.S0921MOT Mother ATB Baby No ATBSaliva MotherSaliva MotherSaliva On-term Mother, B(-)Group III -On-term Mother, B(-)NANANA
F9928.S10151.T0SPT Mother ATB Baby ATB.T0Saliva Baby.T0Saliva BabySaliva Preterm Baby (+)Group I - Preterm Mother, B (-)SalivaPreterm Baby (+)Up to 10
F9928.S11151.T1SPT Mother ATB Baby ATB.T1Saliva Baby.T1Saliva BabySaliva Preterm Baby (+)NASalivaPreterm Baby (+)30
F9928.S12151.T2SPT Mother ATB Baby ATB.T2Saliva Baby.T2Saliva BabySaliva Preterm Baby (+)NASalivaPreterm Baby (+)60
F9928.S13155.T0SPT Mother ATB Baby No ATB.T0Saliva Baby.T0Saliva BabySaliva Preterm Baby (-)Group I - Preterm Mother, B (-)SalivaPreterm Baby (-)Up to 10
F9928.S14155.T1SPT Mother ATB Baby No ATB.T1Saliva Baby.T1Saliva BabySaliva Preterm Baby (-)NASalivaPreterm Baby (-)30
F9928.S15155.T2SPT Mother ATB Baby No ATB.T2Saliva Baby.T2Saliva BabySaliva Preterm Baby (-)NASalivaPreterm Baby (-)60
F9928.S16161.T0SPT Mother ATB Baby ATB.T0Saliva Baby.T0Saliva BabySaliva Preterm Baby (+)Group I - Preterm Mother, B (-)SalivaPreterm Baby (+)Up to 10
F9928.S17161.T1SPT Mother ATB Baby ATB.T1Saliva Baby.T1Saliva BabySaliva Preterm Baby (+)NASalivaPreterm Baby (+)30
F9928.S18161.T2SPT Mother ATB Baby ATB.T2Saliva Baby.T2Saliva BabySaliva Preterm Baby (+)NASalivaPreterm Baby (+)60
F9928.S19171.T0SPT Mother ATB Baby No ATB.T0Saliva Baby.T0Saliva BabySaliva Preterm Baby (-)Group II -Preterm Mother, B(+)SalivaPreterm Baby (-)Up to 10
F9928.S20171.T1SPT Mother ATB Baby No ATB.T1Saliva Baby.T1Saliva BabySaliva Preterm Baby (-)NASalivaPreterm Baby (-)30
F9928.S21171.T2SPT Mother ATB Baby No ATB.T2Saliva Baby.T2Saliva BabySaliva Preterm Baby (-)NASalivaPreterm Baby (-)60
F9928.S22177.T0SPT Mother ATB Baby ATB.T0Saliva Baby.T0Saliva BabySaliva Preterm Baby (+)Group II -Preterm Mother, B(+)SalivaPreterm Baby (+)Up to 10
F9928.S23177.T1SPT Mother ATB Baby ATB.T1Saliva Baby.T1Saliva BabySaliva Preterm Baby (+)NASalivaPreterm Baby (+)30
F9928.S24177.T2SPT Mother ATB Baby ATB.T2Saliva Baby.T2Saliva BabySaliva Preterm Baby (+)NASalivaPreterm Baby (+)60
F9928.S25182.T0SPT Mother ATB Baby No ATB.T0Saliva Baby.T0Saliva BabySaliva Preterm Baby (-)Group II -Preterm Mother, B(+)SalivaPreterm Baby (-)Up to 10
F9928.S26182.T1SPT Mother ATB Baby No ATB.T1Saliva Baby.T1Saliva BabySaliva Preterm Baby (-)NASalivaPreterm Baby (-)30
F9928.S27182.T2SPT Mother ATB Baby No ATB.T2Saliva Baby.T2Saliva BabySaliva Preterm Baby (-)NASalivaPreterm Baby (-)60
F9928.S289.T0SOT Mother ATB Baby No ATB.T0Saliva Baby.T0Saliva BabySaliva On-term Baby (-)Group III -On-term Mother, B(-)SalivaOn-term Baby (-)Up to 10
F9928.S299.T1SOT Mother ATB Baby No ATB.T1Saliva Baby.T1Saliva BabySaliva On-term Baby (-)NASalivaOn-term Baby (-)30
F9928.S309.T2SOT Mother ATB Baby No ATB.T2Saliva Baby.T2Saliva BabySaliva On-term Baby (-)NASalivaOn-term Baby (-)60
F9928.S3113.T0SOT Mother ATB Baby No ATB.T0Saliva Baby.T0Saliva BabySaliva On-term Baby (-)Group III -On-term Mother, B(-)SalivaOn-term Baby (-)Up to 10
F9928.S3213.T1SOT Mother ATB Baby No ATB.T1Saliva Baby.T1Saliva BabySaliva On-term Baby (-)NASalivaOn-term Baby (-)30
F9928.S3313.T2SOT Mother ATB Baby No ATB.T2Saliva Baby.T2Saliva BabySaliva On-term Baby (-)NASalivaOn-term Baby (-)60
F9928.S3421.T0SOT Mother ATB Baby No ATB.T0Saliva Baby.T0Saliva BabySaliva On-term Baby (-)Group III -On-term Mother, B(-)SalivaOn-term Baby (-)Up to 10
F9928.S3521.T1SOT Mother ATB Baby No ATB.T1Saliva Baby.T1Saliva BabySaliva On-term Baby (-)NASalivaOn-term Baby (-)30
F9928.S3621.T2SOT Mother ATB Baby No ATB.T2Saliva Baby.T2Saliva BabySaliva On-term Baby (-)NASalivaOn-term Baby (-)60
F9928.S37151.T0FPT Mother ATB Baby ATB.T0Fecal Baby.T0Fecal BabyStool Preterm Baby (+)Group I - Preterm Mother, B (-)FecalPreterm Baby (+)Up to 10
F9928.S38151.T1FPT Mother ATB Baby ATB.T1Fecal Baby.T1Fecal BabyStool Preterm Baby (+)NAFecalPreterm Baby (+)30
F9928.S39151.T2FPT Mother ATB Baby ATB.T2Fecal Baby.T2Fecal BabyStool Preterm Baby (+)NAFecalPreterm Baby (+)60
F9928.S40155.T0FPT Mother ATB Baby No ATB.T0Fecal Baby.T0Fecal BabyStool Preterm Baby (-)Group I - Preterm Mother, B (-)FecalPreterm Baby (-)Up to 10
F9928.S41155.T1FPT Mother ATB Baby No ATB.T1Fecal Baby.T1Fecal BabyStool Preterm Baby (-)NAFecalPreterm Baby (-)30
F9928.S42155.T2FPT Mother ATB Baby No ATB.TFecal Baby.T2Fecal BabyStool Preterm Baby (-)NAFecalPreterm Baby (-)60
F9928.S43161.T0FPT Mother ATB Baby ATB.T0Fecal Baby.T0Fecal BabyStool Preterm Baby (+)Group I - Preterm Mother, B (-)FecalPreterm Baby (+)Up to 10
F9928.S44161.T1FPT Mother ATB Baby ATB.T1Fecal Baby.T1Fecal BabyStool Preterm Baby (+)NAFecalPreterm Baby (+)30
F9928.S45161.T2FPT Mother ATB Baby ATB.T2Fecal Baby.T2Fecal BabyStool Preterm Baby (+)NAFecalPreterm Baby (+)60
F9928.S46171.T0FPT Mother ATB Baby No ATB.T0Fecal Baby.T0Fecal BabyStool Preterm Baby (-)Group II -Preterm Mother, B(+)FecalPreterm Baby (-)Up to 10
F9928.S47171.T1FPT Mother ATB Baby No ATB.T1Fecal Baby.T1Fecal BabyStool Preterm Baby (-)NAFecalPreterm Baby (-)30
F9928.S48171.T2FPT Mother ATB Baby No ATB.TFecal Baby.T2Fecal BabyStool Preterm Baby (-)NAFecalPreterm Baby (-)60
F9928.S49177.T0FPT Mother ATB Baby ATB.T0Fecal Baby.T0Fecal BabyStool Preterm Baby (+)Group II -Preterm Mother, B(+)FecalPreterm Baby (+)Up to 10
F9928.S50177.T1FPT Mother ATB Baby ATB.T1Fecal Baby.T1Fecal BabyStool Preterm Baby (+)NAFecalPreterm Baby (+)30
F9928.S51177.T2FPT Mother ATB Baby ATB.T2Fecal Baby.T2Fecal BabyStool Preterm Baby (+)NAFecalPreterm Baby (+)60
F9928.S52182.T0FPT Mother ATB Baby No ATB.T0Fecal Baby.T0Fecal BabyStool Preterm Baby (-)Group II -Preterm Mother, B(+)FecalPreterm Baby (-)Up to 10
F9928.S53182.T1FPT Mother ATB Baby No ATB.T1Fecal Baby.T1Fecal BabyStool Preterm Baby (-)NAFecalPreterm Baby (-)30
F9928.S54182.T2FPT Mother ATB Baby No ATB.TFecal Baby.T2Fecal BabyStool Preterm Baby (-)NAFecalPreterm Baby (-)60
F9928.S559.T0FOT Mother ATB Baby No ATB.T0Fecal Baby.T0Fecal BabyStool On-term Baby (-)Group III -On-term Mother, B(-)FecalOn-term Baby (-)Up to 10
F9928.S569.T1FOT Mother ATB Baby No ATB.T1Fecal Baby.T1Fecal BabyStool On-term Baby (-)NAFecalOn-term Baby (-)30
F9928.S579.T2FOT Mother ATB Baby No ATB.T2Fecal Baby.T2Fecal BabyStool On-term Baby (-)NAFecalOn-term Baby (-)60
F9928.S5813.T0FOT Mother ATB Baby No ATB.T0Fecal Baby.T0Fecal BabyStool On-term Baby (-)Group III -On-term Mother, B(-)FecalOn-term Baby (-)Up to 10
F9928.S5913.T1FOT Mother ATB Baby No ATB.T1Fecal Baby.T1Fecal BabyStool On-term Baby (-)NAFecalOn-term Baby (-)30
F9928.S6013.T2FOT Mother ATB Baby No ATB.T2Fecal Baby.T2Fecal BabyStool On-term Baby (-)NAFecalOn-term Baby (-)60
F9928.S6121.T0FOT Mother ATB Baby No ATB.T0Fecal Baby.T0Fecal BabyStool On-term Baby (-)Group III -On-term Mother, B(-)FecalOn-term Baby (-)Up to 10
F9928.S6221.T1FOT Mother ATB Baby No ATB.T1Fecal Baby.T1Fecal BabyStool On-term Baby (-)NAFecalOn-term Baby (-)30
F9928.S6321.T2FOT Mother ATB Baby No ATB.T2Fecal Baby.T2Fecal BabyStool On-term Baby (-)NAFecalOn-term Baby (-)60
 
 

ASV Read Counts by Samples

#Sample IDRead Count
F9928.S63126,116
F9928.S61144,445
F9928.S19147,851
F9928.S60149,623
F9928.S27154,903
F9928.S22155,023
F9928.S07162,140
F9928.S23162,415
F9928.S31163,850
F9928.S05165,210
F9928.S02166,612
F9928.S10172,828
F9928.S21177,339
F9928.S32178,722
F9928.S46182,715
F9928.S28183,914
F9928.S30187,604
F9928.S38192,114
F9928.S29194,669
F9928.S20196,172
F9928.S12200,517
F9928.S40200,729
F9928.S06200,931
F9928.S15205,510
F9928.S58205,616
F9928.S39206,591
F9928.S24207,427
F9928.S54209,789
F9928.S50216,864
F9928.S13217,861
F9928.S59218,958
F9928.S37219,762
F9928.S42220,809
F9928.S56221,259
F9928.S43222,830
F9928.S55223,293
F9928.S34223,512
F9928.S01224,969
F9928.S62225,678
F9928.S16226,814
F9928.S35229,264
F9928.S08230,461
F9928.S53232,780
F9928.S14233,987
F9928.S26235,048
F9928.S17235,650
F9928.S04239,119
F9928.S03239,484
F9928.S18252,775
F9928.S25256,162
F9928.S51258,263
F9928.S52258,523
F9928.S48259,951
F9928.S11266,752
F9928.S09278,839
F9928.S45281,141
F9928.S33282,424
F9928.S36283,528
F9928.S47314,122
F9928.S44326,174
F9928.S41328,223
F9928.S49334,482
F9928.S57375,767
 
 
 

VII. Analysis - Read Taxonomy Assignment

Read Taxonomy Assignment - Methods

 

The species-level, open-reference 16S rRNA NGS reads taxonomy assignment pipeline

Version 20210310
 

1. Raw sequences reads in FASTA format were BLASTN-searched against a combined set of 16S rRNA reference sequences. It consists of MOMD (version 0.1), the HOMD (version 15.2 http://www.homd.org/index.php?name=seqDownload&file&type=R ), HOMD 16S rRNA RefSeq Extended Version 1.1 (EXT), GreenGene Gold (GG) (http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/gold_strains_gg16S_aligned.fasta.gz) , and the NCBI 16S rRNA reference sequence set (https://ftp.ncbi.nlm.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz). These sequences were screened and combined to remove short sequences (<1000nt), chimera, duplicated and sub-sequences, as well as sequences with poor taxonomy annotation (e.g., without species information). This process resulted in 1,015 from HOMD V15.22, 495 from EXT, 3,940 from GG and 18,044 from NCBI, a total of 25,120 sequences. Altogether these sequence represent a total of 15,601 oral and non-oral microbial species.

The NCBI BLASTN version 2.7.1+ (Zhang et al, 2000) was used with the default parameters. Reads with ≥ 98% sequence identity to the matched reference and ≥ 90% alignment length (i.e., ≥ 90% of the read length that was aligned to the reference and was used to calculate the sequence percent identity) were classified based on the taxonomy of the reference sequence with highest sequence identity. If a read matched with reference sequences representing more than one species with equal percent identity and alignment length, it was subject to chimera checking with USEARCH program version v8.1.1861 (Edgar 2010). Non-chimeric reads with multi-species best hits were considered valid and were assigned with a unique species notation (e.g., spp) denoting unresolvable multiple species.

2. Unassigned reads (i.e., reads with < 98% identity or < 90% alignment length) were pooled together and reads < 200 bases were removed. The remaining reads were subject to the de novo operational taxonomy unit (OTU) calling and chimera checking using the USEARCH program version v8.1.1861 (Edgar 2010). The de novo OTU calling and chimera checking was done using 98% as the sequence identity cutoff, i.e., the species-level OTU. The output of this step produced species-level de novo clustered OTUs with 98% identity. Representative reads from each of the OTUs/species were then BLASTN-searched against the same reference sequence set again to determine the closest species for these potential novel species. These potential novel species were pooled together with the reads that were signed to specie-level in the previous step, for down-stream analyses.

Reference:
Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010 Oct 1;26(19):2460-1. doi: 10.1093/bioinformatics/btq461. Epub 2010 Aug 12. PubMed PMID: 20709691.

3. Designations used in the taxonomy:

	1) Taxonomy levels are indicated by these prefixes:
	
	   k__: domain/kingdom
	   p__: phylum
	   c__: class
	   o__: order
	   f__: family
	   g__: genus  
	   s__: species
	
	   Example: 
	
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Blautia;s__faecis
		
	2) Unique level identified – known species:
	   
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__hominis
	
	   The above example shows some reads match to a single species (all levels are unique)
	
	3) Non-unique level identified – known species:

	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__multispecies_spp123_3
	   
	   The above example “s__multispecies_spp123_3” indicates certain reads equally match to 3 species of the 
	   genus Roseburia; the “spp123” is a temporally assigned species ID.
	
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__multigenus;s__multispecies_spp234_5
	   
	   The above example indicates certain reads match equally to 5 different species, which belong to multiple genera.; 
	   the “spp234” is a temporally assigned species ID.
	
	4) Unique level identified – unknown species, potential novel species:
	   
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__ hominis_nov_97%
	   
	   The above example indicates that some reads have no match to any of the reference sequences with 
	   sequence identity ≥ 98% and percent coverage (alignment length)  ≥ 98% as well. However this groups 
	   of reads (actually the representative read from a de novo  OTU) has 96% percent identity to 
	   Roseburia hominis, thus this is a potential novel species, closest to Roseburia hominis. 
	   (But they are not the same species).
	
	5) Multiple level identified – unknown species, potential novel species:
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__ multispecies_sppn123_3_nov_96%
	
	   The above example indicates that some reads have no match to any of the reference sequences 
	   with sequence identity ≥ 98% and percent coverage (alignment length)  ≥ 98% as well. 
	   However this groups of reads (actually the representative read from a de novo  OTU) 
	   has 96% percent identity equally to 3 species in Roseburia. Thus this is no single 
	   closest species, instead this group of reads match equally to multiple species at 96%. 
	   Since they have passed chimera check so they represent a novel species. “sppn123” is a 
	   temporary ID for this potential novel species. 

 
4. The taxonomy assignment algorithm is illustrated in this flow char below:
 
 
 
 

Read Taxonomy Assignment - Result Summary *

CodeCategoryMPC=0% (>=1 read)MPC=0.01%(>=160 reads)
ATotal reads13,896,90313,896,903
BTotal assigned reads1,603,7341,603,734
CAssigned reads in species with read count < MPC00
DAssigned reads in samples with read count < 50000
ETotal samples6363
FSamples with reads >= 5006363
GSamples with reads < 50000
HTotal assigned reads used for analysis (B-C-D)1,603,7341,603,734
IReads assigned to single species1,303,6371,303,637
JReads assigned to multiple species195,974195,974
KReads assigned to novel species104,123104,123
LTotal number of species181181
MNumber of single species112112
NNumber of multi-species66
ONumber of novel species6363
PTotal unassigned reads12,293,16912,293,169
QChimeric reads32,46632,466
RReads without BLASTN hits1,413,5781,413,578
SOthers: short, low quality, singletons, etc.10,847,12510,847,125
A=B+P=C+D+H+Q+R+S
E=F+G
B=C+D+H
H=I+J+K
L=M+N+O
P=Q+R+S
* MPC = Minimal percent (of all assigned reads) read count per species, species with read count < MPC were removed.
* Samples with reads < 500 were removed from downstream analyses.
* The assignment result from MPC=0.1% was used in the downstream analyses.
 
 
 

Read Taxonomy Assignment - ASV Species-Level Read Counts Table

This table shows the read counts for each sample (columns) and each species identified based on the ASV sequences. The downstream analyses were based on this table.
SPIDTaxonomyF9928.S01F9928.S02F9928.S03F9928.S04F9928.S05F9928.S06F9928.S07F9928.S08F9928.S09F9928.S10F9928.S11F9928.S12F9928.S13F9928.S14F9928.S15F9928.S16F9928.S17F9928.S18F9928.S19F9928.S20F9928.S21F9928.S22F9928.S23F9928.S24F9928.S25F9928.S26F9928.S27F9928.S28F9928.S29F9928.S30F9928.S31F9928.S32F9928.S33F9928.S34F9928.S35F9928.S36F9928.S37F9928.S38F9928.S39F9928.S40F9928.S41F9928.S42F9928.S43F9928.S44F9928.S45F9928.S46F9928.S47F9928.S48F9928.S49F9928.S50F9928.S51F9928.S52F9928.S53F9928.S54F9928.S55F9928.S56F9928.S57F9928.S58F9928.S59F9928.S60F9928.S61F9928.S62F9928.S63
SP1Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;faecalis106707152184492615862043223936901137817544423671821876185112038612712398294625091786622711148290941502063330353163154699934101111817335376812251243464315834454126358587964405439912393238816937791577223539951865582916551229944697456371271
SP10Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;hominis362209147191109155159123382550295401279561590306361344166261284131453184235579145194143212149319499284444660277273947010802002721458331358330480254917119917511816927886726628031713460209
SP100Bacteria;Fusobacteria;Fusobacteria;Fusobacteriales;Fusobacteriaceae;Fusobacterium;canifelinum27152002485641665262519338420113501313902563901410276232382247007210000108021109181518073641428949488117
SP101Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Klebsiella;aerogenes412609510612293882635453015300142850610691011511939609257485000000000581000000000008717233
SP102Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;breve3530027418142449422156470253001926124221074801225152150144570009410212006257123633586740213259023121328
SP103Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Micrococcaceae;Glutamicibacter;protophormiae141306011515194831930179717057292624402731519782762012310026010011121629830064504131303
SP104Bacteria;Firmicutes;Clostridia;Negativicutes;Veillonellaceae;Veillonella;sp. HMT780311801718811104152325215634802634117233551125411162614276218407780001500164000561450017133122000000030
SP105Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Sutterellaceae;Parasutterella;excrementihominis47291236115163659103147655025341253613398118411122256485415527200129641500940814363868876072362079955
SP106Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;coyleae63831191813212726282620039280165386733326110615362142015401765245982400401929149001783229155933814531904687253195261471628
SP107Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Bradyrhizobiaceae;Bradyrhizobium;frederickii100501011004023033023120101331622341987290000000000000000000000000000
SP108Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Micrococcaceae;Glutamicibacter;soli586203010711317811217301010392057850495722026261282752659408411717111312500303227004229015108556201283110373313901126312811
SP109Bacteria;Proteobacteria;Hydrogenophilalia;Hydrogenophilales;Hydrogenophilaceae;Tepidiphilus;succinatimandens64342497121132689920281131101034641216631205421109122197245355531502541171200076871200332161628127277630312555125665923916
SP11Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;aureus0000000000007320100100540000000000000500010007400000600600700000000
SP110Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Propionibacteriaceae;Cutibacterium;avidum7867153461662194012113220367111511290105922951086817811102221591812367049991502011371580030152243506972182411070228263160285617503247291611
SP111Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;matruchotii965934823392029104132154411216016015160358722610017371552633063298514226101180340242817822670125027298260180245202651441801075290232318
SP112Bacteria;Firmicutes;Clostridia;Eubacteriales;Clostridiaceae;Hathewaya;histolytica00030103900011310049300343013033020122015644049008008800000800113271501912322335415
SP113Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-1];Saccharibacteria_(TM7)_[G-6];bacterium HMT870322422673695069103038278027300102513010127214013202034841034750000000000102300000000000000
SP114Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;mucosa381818136187617364351934041385133966317212417454546365531822578346299042250021501590161233503239405120781001503214
SP115Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;sputigena000000000014920000045290000068000000000000000000000000104000000000000
SP116Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Shigella;sonnei14130121125152341112532091808980312279185773181132000000000041500000000000000
SP117Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;coli166932967291810382321811825071806203604019417818741221619260001090025300221529000371007102493851714
SP118Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;mastitidis382799251134172667552426455430334701320144511686013593310353499851040020781500628131520240020161913061205
SP121Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Leptothrix;sp. HMT025921766439740000000000340000018503700030000001700000000000000000000000
SP122Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Faecalicatena;fissicatena000000000000147000005001470000000100000410111852784023600242428701830190122406
SP123Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Anaerostipes;caccae15512264014122515270770317046262034810113738000300000032200000000000000
SP124Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides;thetaiotaomicron000000000000761000012927033074000003004004300120077000001400600900000040
SP125Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;propinquum3280942331730132212274303752011186071104291510925271120490001000000410402000000000000
SP14Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Enterobacter;hormaechei247233585237256209841383893447062843304043742433341126221226105914211984127638412020323624648133638145113407044215432344737510673243723344685588127783042272191881147352
SP15Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Bradyrhizobiaceae;Bradyrhizobium;valentinum15511919878317810467196255521111426432701111431301231155221633155264978588143541622395217735300324645727905731818425671081682173803707624323015345541417
SP16Bacteria;Firmicutes;Bacilli;Lactobacillales;Lactobacillaceae;Ligilactobacillus;murinus2015152347710305675766327972260332722321528160033923664516185222726231242155766169713261102378677959881215947712081325713892420446675251412332166179911244985407147558258684601127148129945674071211851301726602263278358
SP17Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Intrasporangiaceae;Janibacter;cremeus136104042801729461421652784127216232171961460955921015422274612966654512117626110251470143040571802111147183012111174752808678109036273209
SP18Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;caprae830532244426171244217319802119136349988413421267297512729381488544220272250612138024939234532460773116972799116655161310913654511960223459159163176437232774114573265224197629693388308134176152
SP19Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Klebsiella;pneumoniae2200171960111156975496668922452338416881562103035493865148117222170406132115687801670998121240498159959971589107521853338170626574706963286304443503853862120479211949398601382108723243170418522998110923801251947815447291428
SP2Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;danieliae6084472812981622201152307049201219557113996310511428657017805704582174967307281110179228203399304623903733745127242833217521410604867948562754563182645683798721204155172271397828379324251204624110
SP20Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Varibaculum;cambriense2099029512749761199119217640214611824822612020519816211695623371841662556264601071421552726142752731122981055595635912625827923527215923524539553713888145277576267190164854456
SP21Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;capitis141912508167975065213536861742215818821131809232724601701463159016285697447173611498292691425695620105095513712286153219773066378049622115697615269213695236714167045621105157428445459270111821190722533234483377
SP22Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;tuberculostearicum2591817128514257782784244992001742143902043555125187773262398344765911116810129839581278545027544087214703592562269328019721724223865194252208298525297211777882
SP23Bacteria;Fusobacteria;Fusobacteria;Fusobacteriales;Fusobacteriaceae;Fusobacterium;nucleatum91166825407154167291312908147367063037314221539081388113951364825145646246116153069530556341890414297411070176111601561129138892075121523202832045623431025151113620117062193858940545918280119
SP24Bacteria;Firmicutes;Bacilli;Lactobacillales;Aerococcaceae;Facklamia;languida14912058330394457186299321411643533531721171614721971315910929425338646611396919231510122442314113645221745131296102101124581401322383303011434326304653381519
SP25Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Citrobacter;murliniae291518232213551425574132607001426029293381514595471562892664732670020631700022562113301201718130120303
SP26Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Micrococcaceae;Rothia;mucilaginosa176108275125301084086265243483180952623282064212781352121837098324152332583286165120180285314260417108346238571881442993655341421087842558030123155125340362180186755954
SP27Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Actinomyces;sp. HMT44872422214313641843115794759411692800254143764371811323371102261815541486374135134141320562326139711017979281277724838932038109186442365623726
SP28Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Pantoea;allii351501143592933190402424202166586852415913435116092052943921550070313616304922881402124180180152105972171892830412724311
SP29Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;sicca13212511363812102003008120013290293514647000590020000048091578975062403412749120538422359136534914621
SP3Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;parasanguinis_clade_41180700382022253810112335195141701521232731384998732731622817135124011432961553312420079107620198262803128021317512369334475823868536633759374319
SP30Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;sp. HMT06612811510046242562772001766341431223319702282001509132753142718441146486799419231160256001114724717502441537215359172113412509638426192372189112156833441
SP31Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Methylophilaceae;Methylophilus;leisingeri61430291518121854105162348311202745164653517162121488225016272457882060122003619288195022677151652878901432123962591511891018068291545
SP32Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseriaceae_[G-1];bacterium HMT174332671710176522296158477207695413127210810634333105301595843495715117486214701201342048583122251421785810721747640362923437616635510582116683429
SP33Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Propionibacteriaceae;Cutibacterium;acnes10396621947917618127832010571582787690883170517373852103312509782234576477466184635480346693595108915736411285225929242296143150427355523034819935054831878411339920732334283862341832515277233
SP34Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Weizmannia;ginsengihumi29191154721038301722512153563413841015171078293150518113027830716102123225319301697529112635385128274158611041317271261216
SP35Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Bradyrhizobiaceae;Bradyrhizobium;lupini2191601741264674529925632757618639137729615819629130117814785902823883969524961692341973458623847420003069118322622846343842131872062821630109791710113810212729
SP36Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;gallinarum4262923602647517510718455474116224842787448173793615112483671356274024483912418156302339455778365623993110178155322491973015795234049919879390515712829473443473261261625689
SP37Bacteria;Proteobacteria;Gammaproteobacteria;Moraxellales;Moraxellaceae;Acinetobacter;lwoffii413240122265841611172315386262723333217818593532824101242633201272341372368201383814830822945765950860787532401357241627427311487111336330294136675938123120111230636271227231976791
SP38Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Enterobacter;cloacae513111340224115278394164781031261051601098464219681051995371354124175361164813827045411741715341114516718072136841501724777162155198963857478620
SP4Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;mitis4123214852111482721132714535889083175265146722893434194882952191614723894236011352091692841333345442943857583698296121912312413731854486729433413352592709255243274375799466375284147111105
SP40Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;oralis271013711184537132939036200310108431921400196312243291420837201392690018701200025611189107272592610954361247
SP41Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;durum224138010341424881193313249144261340422176198212527227118822611184563626211641491021993372562834292320644553895232151183441631022491441862401227012612927036836803527
SP42Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;pseudolongum377236483272611819323342354241028236527556182299367306221239861642493655671093211226913035549127240663811903103015947017910987188967413023474362721571065637528
SP43Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Propionibacteriaceae;Cutibacterium;namnetense1036985623212233132196190981551791911103140179867724169991312374254595491052016995280440252437555783031701126717213614317827127537211919416611898401828
SP44Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Microbacteriaceae;Microbacterium;maritypicum25130162612191940313401923938163725011253252922413295461711324232304412959691241936335438516709012062651151141021095201858915504421233712
SP45Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;ureicelerivorans75651677011551733120119405341171282096359154764716641713812429102067257112427721348810275153425028181283688681206735239168047492701436614411164
SP46Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;aurimucosum1569518026354342151220351024192702881891061241681985192301492953910668717715825522228639422327255630450328017534012989331892321406960119012242655
SP47Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Jeotgalicoccus;halotolerans130910119271074013323621836813213221223302991891471301004618122915526436287775110171278421270273400463633374752691472041041261871033374875318588109273428249219986173
SP48Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;jeikeium24423016897705570113184275471201652663150921721921517673326834072939019648248283591903795906030160191110241357819716488463610391096371296365170206544249
SP49Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;Triticum;aestivum4733312022408514198200498696600365634804829573395464182353311294393232558121363318234537749370240760510315952195115977161568221243313162504242687128912198726627663917420416312191103
SP5Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;longum9314327699748625451711552551112671581740122122125380351481013117034172431074687132461272189105868360631526022699323117920239655936078114292254973633224
SP50Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Erysipelotrichaceae;Faecalibaculum;rodentium3291751172935414312420340646487189318476569117427418218223510424561225563150100146262992774819933262611525691966662572104041893683472272581664035531501631922534291621499714226968
SP51Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;subtilis659443467372152262299274807100971748250710431188299568712203304422394873942821179294198357541367747102734577813933682362052161339434398173928347138138752864583111372623883283688871067657593312161218
SP52Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;afermentans9570047111223291061647541464041611832185178134450261022774317828123714311512738952051430286321150218260672932119110181192393982471391348161251918
SP53Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Salmonella;enterica95701522823353514419743792282162340871261151008050101814220539150894311518649143299720629501660503030561553119701812550211271
SP56Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Peribacillus;frigoritolerans373325237810292588641932845862135556232927713323821210154239307512568645104081663468013671123323753479684952301301206435141111
SP57Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Nocardiaceae;Rhodococcus;qingshengii28219915213440435110323937053158188377381111322718418277633964146421741621022313386728952458021100870003564290014456250191427682111319
SP58Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;thoraltensis368256421607192263147374526330274357589554725934482182641094492021086152041782152521163285558640264613903969927160237012297811072942162493625842199225363144185116791542
SP59Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;lactarius1451091397326754544178242451101442512682106138105125914914912429437175311460163260206220332380354237079254160303188412463695041215901851351014
SP6Bacteria;Proteobacteria;Gammaproteobacteria;Moraxellales;Moraxellaceae;Acinetobacter;pittii432017847104062072512137281455524457168482313132313010499171948325320276581930182001001950205125429113470103144703171241
SP66Eukaryota;Streptophyta;Magnoliopsida;Ericales;Actinidiaceae;Actinidia;eriantha334223331986692259146403577562305276186430312391231231243130186253296691221182250298163426611334440799006446461730920142618820714540554536084130283197456841621
SP67Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;casseliflavus366266327230701457615344560852533525866668003153987181262891563091276721022411129114834254933449077084044583714614823614812315312918911334240464178154110350493236211737158
SP68Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;warneri134831331582869224115619932712420421921120026219814312513045178226133205421558116411281734615027315014643732270200015212816154155972362448014819110728037121017312521483
SP69Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Bradyrhizobiaceae;Bradyrhizobium;japonicum34224512526112785170347439680260272477516326406412303294291121389398314529110401472531263204561093406173030816182514723424711429313694264258275339891413942183773982672021046080
SP7Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;pasteuri16299159983192418417422820512020621926963952161319314849102240542583876710497162274344260357920612321238150111008110957915219621854507251184150918321922
SP70Bacteria;Proteobacteria;Gammaproteobacteria;Moraxellales;Moraxellaceae;Acinetobacter;johnsonii2927211866222567133341235755033530252545821153171032472133461444593804201019274080111133681124105914350022513256
SP71Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Enterobacter;mori42401235561154703430265860143601201004718404492771301136193776175382004456210001492812037925310421013221030174907
SP72Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Tannerella;serpentiformis41332226414817334459142378765550175601311076529412310356136642913223613427026061520500303102066911720435240370921
SP73Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Mammaliicoccus;sciuri1405226411043984775165170309107144144176315614712510388721431551182243911991186121923418814622052315831379701026911517596243351111121966759342175112222637
SP74Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Shigella;flexneri8042030102320326912021402871181274496830910464351091024014119336426586122302145131970264420633169010480943587501229743169491516352619917
SP75Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;kroppenstedtii100969436141622401071442818298156182012010524248316129271662611236742991522511220841002700716350017184681316030200181084745468
SP76Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;pilbarense695723910215162341791681123829412312210426413211144557361591474413224262742667914127917214522708070268371357541878813962154163300323605405930354762331123
SP77Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Mammaliicoccus;lentus76680507133034881421757312143146092100278117602980831991733894474457413814911019500221017916001244745124973814115517096621200233125379
SP78Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Mediterraneibacter;faecis60390171714121669943936230971064137611185432748692997260143729518722491147264136841715725477103104487434869654765424841245540452444
SP79Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Klebsiella;michiganensis14170300248143431924542016300191630874862141726175497615300427000303423557990007320116503
SP8Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;fergusonii3092445892171462457017939542631528425444849213352939796244927414342229390452610021913924630732352274853868916963171155140923123062115369124251318254428434112532092863751321781461052454
SP80Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;bifidum25140675121622325241334935094016567151072219752173303916293611205851046131742769029550166135515586876616912255208426201165426749
SP81Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;chosunense33280235111113314615534250485514337151928581467611661874039201431261433841021327364837715408116850879211015823011370907512225319
SP82Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Enterobacter;kobei8683011522813576178123156161621500429624548233326311883913669248115154146147196004626318002722315913529221068857476481620103113238
SP83Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Oxalobacteraceae;Massilia;arenae83553343546934468287658315412611618312284051311843217123187521413264117178951603702436366548049471341824135140144205384076711581258055362033
SP84Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Actinomyces;sp. HMT5251610023025141951642525010200813104630901144182781827000000000064100000000000000
SP85Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Lawsonellaceae;Lawsonella;clevelandensis421401747111531593295711452671218465145572811276134102774212223132751235367115010424909411500165102513613721133101041031082264746015
SP86Bacteria;Proteobacteria;Alphaproteobacteria;Sphingomonadales;Sphingomonadaceae;Sphingomonas;leidyi3834027166111351114114239598856384918973403218751371271813130545194225102881640953345572950014751891701240180838445047426210
SP87Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;oralis6854039151215144910536351108122013872767702017164211233022541326110031821270018103102060091271325112106101143200561451470571831612
SP88Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;mutans1036305918322140112124655015137186111475127201831179415816836207363297462110167268200215930215036753412606126161032501111512033505387144063212522514
SP89Bacteria;Proteobacteria;Gammaproteobacteria;Moraxellales;Moraxellaceae;Acinetobacter;septicus191601718451435315864842034343111612809772133121924244714812550380242516226670157021451351411351039323911611868593719032
SP9Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;halotolerans1771513481234313744103283269627172493250310149366390150160140602213372123838627801531651822205221045324328710849484110377088167101521761622322508183150105256247228125585246
SP90Bacteria;Firmicutes;Bacilli;Lactobacillales;Aerococcaceae;Abiotrophia;defectiva135791208118481658149183437413915519206611165873344324842193117336320014016516219329741013271181579094026338544143205220342997017272889
SP91Bacteria;Proteobacteria;Gammaproteobacteria;Moraxellales;Moraxellaceae;Acinetobacter;radioresistens13302230210183100192709201411434213540710092032224008281228002204324245996302012442012118810
SP92Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;saprophyticus12340109531115321181835401371322029291993210331282047232244120435956000003700084150007459615610753485369
SP93Bacteria;Firmicutes;Clostridia;Eubacteriales;Eubacteriaceae;Pseudoramibacter;alactolyticus1327712812631144291262382084095222202470791602638942845850245652474128361391693615826700151321228027058735368876757417389650681548847165016
SP94Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;appendicis14110125461622436132384802012011181309653805144292692748000100000044300000000000000
SP95Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;pseudodiphtheriticum3424128571522498310321210276025491616529237681571018213301118182319213400220148350211060993759501335402836113734514615
SP96Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Robinsoniella;peoriensis810075330122207325230212067300418301103131742032000000000004100000000000000
SP97Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;falsenii37140280375345962117764014264102766429621007171334571333460012982700042542632501301824230811300
SP98Bacteria;Firmicutes;Bacilli;Lactobacillales;Lactobacillaceae;Lactiplantibacillus;plantarum4135127671115397042713268721392034411291611122462160141717477314256712202118001450111084212403112326524080167444751119
SP99Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;sp. HMT31413293050271930421432112690138235245159129859833193294924440435100511431823016029239028282093198011663241897414726624051671670413328109
SPN1Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Mammaliicoccus;sciuri_nov_96.743%00017033104002004021035103010254022417130273000100000000000000000000000
SPN10Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Erysipelotrichaceae;Faecalibaculum;rodentium_nov_93.103%26064013916210119220660360022203058481661127000000000000200000000000000
SPN11Bacteria;Actinobacteria;Coriobacteriia;Coriobacteriales;Atopobiaceae;Olsenella;phocaeensis_nov_91.977%281101352129414113266544902133071584212545531222817371226590032851800628114281744639091760020913314
SPN12Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Anaerostipes;caccae_nov_96.266%11170714338102411817021903620522110330131111417000000000014000000000000000
SPN13Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Erysipelotrichaceae;Longibaculum;muris_nov_90.607%000000000025800009210039150043000000000000000000000000002000000000000
SPN14Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;matruchotii_nov_86.497%17206224262771401716077059200016300514141626000000000031000000000000000
SPN15Bacteria;Firmicutes;Bacilli;Lactobacillales;Lactobacillaceae;Limosilactobacillus;reuteri_nov_93.571%0000000000000000002000350000000000000001079500020021024306011108090503
SPN16Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Klebsiella;pneumoniae_nov_91.098%1190624427116501319096011730062010833101725180022200060141763781300945004313
SPN17Bacteria;Proteobacteria;Alphaproteobacteria;Rhodospirillales;Zavarziniaceae;Zavarzinia;aquatilis_nov_84.906%100103074004004052117033006006016110113291005464700032008110040131260182804
SPN18Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Robinsoniella;peoriensis_nov_94.563%00000000000000000000002700000000000000020697000290511180080716140121500
SPN19Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;coyleae_nov_94.908%00000000000000000000004000000000000000060611400015005250050131480130406
SPN2Bacteria;Fusobacteria;Fusobacteria;Fusobacteriales;Fusobacteriaceae;Fusobacterium;nucleatum_nov_91.853%141101164022132278533310916416157509426151082731732443700140060000727005001200000000
SPN20Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;faecalis_nov_95.255%200424545516409981255003914100755157001914091220038164803819100872063005950214590313143329422441258
SPN21Bacteria;Firmicutes;Bacilli;Lactobacillales;Lactobacillaceae;Ligilactobacillus;murinus_nov_95.911%0000000000009400000400250000000000000280090069000003001000700000000
SPN22Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Erysipelotrichaceae;Faecalibaculum;rodentium_nov_91.098%511061113131602117190114053200724301103995920000000000024300000000000000
SPN23Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Enterobacter;mori_nov_89.749%00000000002066000003000200000005000004800220064000003001100400000000
SPN24Bacteria;Bacteroidota;Chitinophagia;Chitinophagales;Chitinophagaceae;Sediminibacterium;roseum_nov_92.115%0000000000108100000400720000000000000410080058000008001000700000000
SPN25Bacteria;Proteobacteria;Gammaproteobacteria;Chromatiales;Chromatiaceae;Rheinheimera;nanhaiensis_nov_86.051%0000000000000000000000000000000000000001000019100005220000000000000
SPN26Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;falsenii_nov_91.304%000000000030000000000000000000000000004314000200218353744002260410302
SPN27Bacteria;Bacteroidota;Bacteroidia;Bacteroidales;Muribaculaceae;Duncaniella;freteri_nov_90.977%000000000000000000000000000000000000001500000790509001402000040002000
SPN28Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;pseudolongum_nov_94.646%13110825424153311826031405710311210388894917000000000000000000000000000
SPN29Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-11];bacterium_MOT-176_nov_95.136%000000000000000000000000000000000000014001242505110034004000540000000
SPN3Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Turicibacteraceae;Turicibacter;sanguinis_nov_95.437%000810521310020600420018020120111310142428751303100000000000100000000501600000250
SPN30Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Micrococcaceae;Kocuria;indica_nov_94.872%35326620419555130951174115853102603162568202483585164252802451542586649123992283103905181154268270276441839114022362307315727417956494346011818047110991051163050
SPN31Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae;Phycocomes;zhengii_nov_88.764%371214156135712232334132280910098130162933604121219258222100000000001085000003500000000
SPN32Bacteria;Proteobacteria;Gammaproteobacteria;Vibrionales;Vibrionaceae;Photobacterium;frigidiphilum_nov_83.878%0000000000000000002000410000000000000000259700027021222501018175020300
SPN33Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;thoraltensis_nov_95.652%0000000000000000001102407007200000400500000400000000000000000000000
SPN34Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Bradyrhizobiaceae;Bradyrhizobium;valentinum_nov_90.985%000000003011110000032170000049000000000000000000000000003000000000000
SPN35Bacteria;Actinobacteria;Coriobacteriia;Eggerthellales;Eggerthellaceae;Adlercreutzia;caecimuris_nov_92.843%0000000000000000000000370000000000000003762900021031613507012146060500
SPN36Bacteria;Firmicutes;Clostridia;Eubacteriales;Oscillospiraceae;Oscillospiraceae_[G-2];bacterium_MOT-149_nov_95.427%0000000000100000000000000000000000000051500010061284386200929029520
SPN37Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Propionibacteriaceae;Cutibacterium;avidum_nov_95.072%0000000000000000002000330000000000000000051180001900111200120143112050302
SPN38Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;gallinarum_nov_91.392%13100611515210421470312041120022530172111251318000000000050000000000000000
SPN39Bacteria;Firmicutes;Bacilli;Lactobacillales;Carnobacteriaceae;Granulicatella;elegans_nov_84.142%00000000017006900000300440000000000000300015006700000400400800000000
SPN4Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Weizmannia;ginsengihumi_nov_96.346%8806231591124016160716071120051110338141821222022300000000115200000000000000
SPN40Bacteria;Actinobacteria;Coriobacteriia;Eggerthellales;Eggerthellaceae;Eggerthella;timonensis_nov_88.867%000000000013011000025120000043000000000000000000000000104000000000000
SPN41Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;glyciniphilum_nov_88.528%000000000000871000040001300000001000003400100049000001100800400000040
SPN42Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;faecalis_nov_96.175%17170122949243772304137015201111814557621617151935481206143002676260004117162520401319170995778036
SPN43Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;argentoratense_nov_95.850%12613641002519455823121848413830526224124635418414123116661752845829948548109421392204516626519102456342812693401861165418114610012316623083901164563630636
SPN44Bacteria;Bacteroidota;Sphingobacteriia;Sphingobacteriales;Sphingobacteriaceae;Daejeonella;oryzae_nov_85.551%231171110029345066233314371433237237419412119709117612333267390675541376522131287232419101011440005400039471630001023821146617271175184556
SPN45Bacteria;Firmicutes;Clostridia;Thermoanaerobacterales;Thermodesulfobiaceae;Thermodesulfobium;acidiphilum_nov_81.729%79600532024243384151209734591461251821181186982135274185110568163371945926090143681161902430237602834235406108274416676153240334920092168119110683947173
SPN46Bacteria;Cyanobacteria;Oscillatoriophycideae;Oscillatoriophycideae;Oscillatoriales;Arthrospira;platensis_nov_88.987%11313708094333481150204238325723623308113512899350626744221541598313411213523352826125310601028114301640039163351230196431284232146846994218
SPN47Bacteria;Firmicutes;Bacilli;Bacillales;Planococcaceae;Sporosarcina;sp._MOT-205_nov_96.360%141106172193128501462663510093226252093175280925776544228762147412660132256198195334530816101171040135455513054268213213528948651406565194019
SPN48Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Brevibacteriaceae;Brevibacterium;paucivorans_nov_97.368%451411071721821112801433455208241918369143810309714953811115270601564272122358706429204191440174501643795493504007612211984338366167189455750
SPN49Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;danieliae_nov_95.594%9211621270214424731131452157326812212711730213651329652831722613743223992615213330214914223102266316522260597232612063911412615110755431024818314587682635136
SPN5Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Mediterraneibacter;glycyrrhizinilyticus_nov_95.367%0000000000000000000000000000000000000150013713002000420060006170000000
SPN50Bacteria;Firmicutes;Clostridia;Eubacteriales;Clostridiaceae;Clostridium;saccharoperbutylacetonicum_nov_95.800%553604211131736558314953292721010697114868483755691631092073040189841232881701281300385996122551674407018852490122387552910301915985
SPN51Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Microbacteriaceae;Microbacterium;saccharophilum_nov_76.981%8034031917173675113154461071500507113661561326314184143230184513681101351071820081492235003175182776115703608117894557726019
SPN52Bacteria;Firmicutes;Bacilli;Lactobacillales;Lactobacillaceae;Ligilactobacillus;murinus_nov_97.020%000000000000000000000031000000000000000170668008329445114124024414147001198977045
SPN53Bacteria;Bacteroidota;Bacteroidia;Bacteroidales;Muribaculaceae;Duncaniella;freteri_nov_90.075%25180244410103450520247520243328335511015535011291138518456600131262500038372325601101311165120500
SPN54Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Burkholderiaceae;Ralstonia;solanacearum_nov_95.922%64271253411551120866929238593752264821738144121383104133015454418529741019141206704041460912915712331120911254857242615
SPN55Bacteria;Actinobacteria;Actinomycetia;Frankiales;Frankiaceae;Frankia;discariae_nov_81.481%1480902351924138172242203438089133515187041052019617180040111302009432218733717427290193814102536301017
SPN56Bacteria;Firmicutes;Bacilli;Lactobacillales;Lactobacillaceae;Ligilactobacillus;murinus_nov_96.667%1150234671117690191201690425000181173782301113003459000840652129411915724501211003181024
SPN57Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Erysipelotrichaceae;Faecalibaculum;rodentium_nov_96.571%115053275171101022830031504860451910663102351535000000000022300000000000000
SPN58Bacteria;Firmicutes;Bacilli;Lactobacillales;Lactobacillaceae;Lactobacillus;gasseri_nov_94.075%2317074822192851512934013250171610071230347205202871756000300000054700000000000000
SPN59Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;faecalis_nov_94.118%117032622920030341603603122033211085561121014011011191801012802413110130006120009000
SPN6Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;thoraltensis_nov_94.361%79032113141809024270470413202322501112111141718000000000020100000000000000
SPN7Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Propionibacteriaceae;Cutibacterium;acnes_nov_94.643%000000000000000000000000000000000000000700020600006150022720610244304127
SPN8Bacteria;Actinobacteria;Actinomycetia;Pseudonocardiales;Pseudonocardiaceae;Prauserella;oleivorans_nov_80.113%19710537736541841131202112719043500241170785342672483407201634000150000004030300484716791372632212
SPN9Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;faecalis_nov_96.685%870534031029061152407702161043100016271671619000000000021200000000000000
SPP1Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Bradyrhizobiaceae;Bradyrhizobium;multispecies_spp1_2136072142172536222180670241000183065572131829000000000023600000000000000
SPP2Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;multispecies_spp2_357183787116280090610891226196761669367159038261424985011110154082524781330834368131519011360255111334174913217113839197058979654281969511308837704633753799850734786121814341979159814511122166222886078917911043250121061580976561199389
SPP3Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;multispecies_spp3_2141206203312113601817011002133012113093213162920000000000002000000000000000
SPP4Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;multigenus;multispecies_spp4_25639039171431329698164678130130068771425118777323151486476716761232982165002261522000015829245428908643734421057132361045
SPP5Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_spp5_276360511016151990114253744181291375858012365327105822511623224533274107227218611609287277482802742875154125014263594648401031184241381261591005344
SPP6Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;multispecies_spp6_61033215134144860287169603961846892428211314412362252522738993725761494050831299597770894852227315305218484592531654936547379227238911152312792937
SPPN1Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;multispecies_sppn1_3_nov_95.362%40113812731112049810039620046130001051000002800110071000001400600500000000
SPPN2Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;multispecies_sppn2_3_nov_96.987%000000000000000000000000000000000000000100000079111019000619300000019327
SPPN3Bacteria;Bacteroidota;Chitinophagia;Chitinophagales;Chitinophagaceae;Terrimonas;multispecies_sppn3_2_nov_90.822%75574401333473271901540109613604765028502309251214313274536749424681520045561700807214010110788103196331268025155094923610
SPPN4Bacteria;Cyanobacteria;Gloeobacteria;Gloeobacterales;Gloeobacteraceae;Gloeobacter;multispecies_sppn4_2_nov_84.254%4641314261117348593920978882592031820331816527712011428143187233611521002280021900019270024213942032449170154133541
 
 
Download OTU Tables at Different Taxonomy Levels
PhylumCount*: Relative**: CLR***:
ClassCount*: Relative**: CLR***:
OrderCount*: Relative**: CLR***:
FamilyCount*: Relative**: CLR***:
GenusCount*: Relative**: CLR***:
SpeciesCount*: Relative**: CLR***:
* Read count
** Relative abundance (count/total sample count)
*** Centered log ratio transformed abundance
;
 
The species listed in the table has full taxonomy and a dynamically assigned species ID specific to this report. When some reads match with the reference sequences of more than one species equally (i.e., same percent identiy and alignmnet coverage), they can't be assigned to a particular species. Instead, they are assigned to multiple species with the species notaton "s__multispecies_spp2_2". In this notation, spp2 is the dynamic ID assigned to these reads that hit multiple sequences and the "_2" at the end of the notation means there are two species in the spp2.

You can look up which species are included in the multi-species assignment, in this table below:
 
 
 
 
Another type of notation is "s__multispecies_sppn2_2", in which the "n" in the sppn2 means it's a potential novel species because all the reads in this species have < 98% idenity to any of the reference sequences. They were grouped together based on de novo OTU clustering at 98% identity cutoff. And then a representative sequence was chosed to BLASTN search against the reference database to find the closest match (but will still be < 98%). This representative sequence also matched equally to more than one species, hence the "spp" was given in the label.
 
 

Taxonomy Bar Plots for All Samples

 
 

Taxonomy Bar Plots for Individual Comparison Groups

 
 
Comparison No.Comparison NameFamiliesGeneraSpecies
Comparison 2Saliva Preterm Baby (+) vs Saliva Preterm Baby (-) vs Saliva On-term Baby (-)PDFSVGPDFSVGPDFSVG
Comparison 3Stool Preterm Baby (+) vs Stool Preterm Baby (-) vs Stool On-term Baby (-)PDFSVGPDFSVGPDFSVG
 
 

VIII. Analysis - Alpha Diversity

 

In ecology, alpha diversity (α-diversity) is the mean species diversity in sites or habitats at a local scale. The term was introduced by R. H. Whittaker[1][2] together with the terms beta diversity (β-diversity) and gamma diversity (γ-diversity). Whittaker's idea was that the total species diversity in a landscape (gamma diversity) is determined by two different things, the mean species diversity in sites or habitats at a more local scale (alpha diversity) and the differentiation among those habitats (beta diversity).


References:
Whittaker, R. H. (1960) Vegetation of the Siskiyou Mountains, Oregon and California. Ecological Monographs, 30, 279–338. doi:10.2307/1943563
Whittaker, R. H. (1972). Evolution and Measurement of Species Diversity. Taxon, 21, 213-251. doi:10.2307/1218190

 

Alpha Diversity Analysis by Rarefaction

Diversity measures are affected by the sampling depth. Rarefaction is a technique to assess species richness from the results of sampling. Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the number of samples. Rarefaction curves generally grow rapidly at first, as the most common species are found, but the curves plateau as only the rarest species remain to be sampled.


References:
Willis AD. Rarefaction, Alpha Diversity, and Statistics. Front Microbiol. 2019 Oct 23;10:2407. doi: 10.3389/fmicb.2019.02407. PMID: 31708888; PMCID: PMC6819366.

 
 
 

Boxplot of Alpha-diversity Indices

The two main factors taken into account when measuring diversity are richness and evenness. Richness is a measure of the number of different kinds of organisms present in a particular area. Evenness compares the similarity of the population size of each of the species present. There are many different ways to measure the richness and evenness. These measurements are called "estimators" or "indices". Below is a diversity of 3 commonly used indices showing the values for all the samples (dots) and in groups (boxes).

 
Alpha Diversity Box Plots for All Groups
 
 
 
 
 
 
 
Alpha Diversity Box Plots for Individual Comparisons
 
Comparison 1Saliva Preterm Mother, B(-) vs Saliva Preterm Mother, B(+) vs Saliva On-term Mother, B(-)View in PDFView in SVG
Comparison 2Saliva Preterm Baby (+) vs Saliva Preterm Baby (-) vs Saliva On-term Baby (-)View in PDFView in SVG
Comparison 3Stool Preterm Baby (+) vs Stool Preterm Baby (-) vs Stool On-term Baby (-)View in PDFView in SVG
 
 
 

Group Significance of Alpha-diversity Indices

To test whether the alpha diversity among different comparison groups are different statistically, we use the Kruskal Wallis H test provided the "alpha-group-significance" fucntion in the QIIME 2 "diversity" package. Kruskal Wallis H test is the non-parametric alternative to the One Way ANOVA. Non-parametric means that the test doesn’t assume your data comes from a particular distribution. The H test is used when the assumptions for ANOVA aren’t met (like the assumption of normality). It is sometimes called the one-way ANOVA on ranks, as the ranks of the data values are used in the test rather than the actual data points. The H test determines whether the medians of two or more groups are different.

Below are the Kruskal Wallis H test results for each comparison based on three different alpha diversity measures: 1) Observed species (features), 2) Shannon index, and 3) Simpson index.

 
 
Comparison 1.Saliva Preterm Mother, B(-) vs Saliva Preterm Mother, B(+) vs Saliva On-term Mother, B(-)Observed FeaturesShannon IndexSimpson Index
Comparison 2.Saliva Preterm Baby (+) vs Saliva Preterm Baby (-) vs Saliva On-term Baby (-)Observed FeaturesShannon IndexSimpson Index
Comparison 3.Stool Preterm Baby (+) vs Stool Preterm Baby (-) vs Stool On-term Baby (-)Observed FeaturesShannon IndexSimpson Index
 
 

IX. Analysis - Beta Diversity

 

NMDS and PCoA Plots

Beta diversity compares the similarity (or dissimilarity) of microbial profiles between different groups of samples. There are many different similarity/dissimilarity metrics. In general, they can be quantitative (using sequence abundance, e.g., Bray-Curtis or weighted UniFrac) or binary (considering only presence-absence of sequences, e.g., binary Jaccard or unweighted UniFrac). They can be even based on phylogeny (e.g., UniFrac metrics) or not (non-UniFrac metrics, such as Bray-Curtis, etc.).

For microbiome studies, species profiles of samples can be compared with the Bray-Curtis dissimilarity, which is based on the count data type. The pair-wise Bray-Curtis dissimilarity matrix of all samples can then be subject to either multi-dimensional scaling (MDS, also known as PCoA) or non-metric MDS (NMDS).

MDS/PCoA is a scaling or ordination method that starts with a matrix of similarities or dissimilarities between a set of samples and aims to produce a low-dimensional graphical plot of the data in such a way that distances between points in the plot are close to original dissimilarities.

NMDS is similar to MDS, however it does not use the dissimilarities data, instead it converts them into the ranks and use these ranks in the calculation.

In our beta diversity analysis, Bray-Curtis dissimilarity matrix was first calculated and then plotted by the PCoA and NMDS separately. Below are beta diveristy results for all groups together:

 
 
NMDS and PCoA Plots for All Groups
 
 
 
 
 

The above PCoA and NMDS plots are based on count data. The count data can also be transformed into centered log ratio (CLR) for each species. The CLR data is no longer count data and cannot be used in Bray-Curtis dissimilarity calculation. Instead CLR can be compared with Euclidean distances. When CLR data are compared by Euclidean distance, the distance is also called Aitchison distance.

Below are the NMDS and PCoA plots of the Aitchison distances of the samples:

 
 
 
 
 
 
 
NMDS and PCoA Plots for Individual Comparisons
 
 
Comparison No.Comparison NameNMDAPCoA
Bray-CurtisCLR EuclideanBray-CurtisCLR Euclidean
Comparison 2Saliva Preterm Baby (+) vs Saliva Preterm Baby (-) vs Saliva On-term Baby (-)PDFSVGPDFSVGPDFSVGPDFSVG
Comparison 3Stool Preterm Baby (+) vs Stool Preterm Baby (-) vs Stool On-term Baby (-)PDFSVGPDFSVGPDFSVGPDFSVG
 
 
 
 
 

Interactive 3D PCoA Plots - Bray-Curtis Dissimilarity

 
 
 

Interactive 3D PCoA Plots - Euclidean Distance

 
 
 

Interactive 3D PCoA Plots - Correlation Coefficients

 
 
 

Group Significance of Beta-diversity Indices

To test whether the between-group dissimilarities are significantly greater than the within-group dissimilarities, the "beta-group-significance" function provided in the QIIME 2 "diversity" package was used with PERMANOVA (permutational multivariate analysis of variance) as the group significant testing method.

Three beta diversity matrics were used: 1) Bray–Curtis dissimilarity 2) Correlation coefficient matrix , and 3) Aitchison distance (Euclidean distance between clr-transformed compositions).

 
 
Comparison 1.Saliva Preterm Mother, B(-) vs Saliva Preterm Mother, B(+) vs Saliva On-term Mother, B(-)Bray–CurtisCorrelationAitchison
Comparison 2.Saliva Preterm Baby (+) vs Saliva Preterm Baby (-) vs Saliva On-term Baby (-)Bray–CurtisCorrelationAitchison
Comparison 3.Stool Preterm Baby (+) vs Stool Preterm Baby (-) vs Stool On-term Baby (-)Bray–CurtisCorrelationAitchison
 
 
 

X. Analysis - Differential Abundance

16S rRNA next generation sequencing (NGS) generates a fixed number of reads that reflect the proportion of different species in a sample, i.e., the relative abundance of species, instead of the absolute abundance. In Mathematics, measurements involving probabilities, proportions, percentages, and ppm can all be thought of as compositional data. This makes the microbiome read count data “compositional” (Gloor et al, 2017). In general, compositional data represent parts of a whole which only carry relative information (http://www.compositionaldata.com/).

The problem of microbiome data being compositional arises when comparing two groups of samples for identifying “differentially abundant” species. A species with the same absolute abundance between two conditions, its relative abundances in the two conditions (e.g., percent abundance) can become different if the relative abundance of other species change greatly. This problem can lead to incorrect conclusion in terms of differential abundance for microbial species in the samples.

When studying differential abundance (DA), the current better approach is to transform the read count data into log ratio data. The ratios are calculated between read counts of all species in a sample to a “reference” count (e.g., mean read count of the sample). The log ratio data allow the detection of DA species without being affected by percentage bias mentioned above

In this report, a compositional DA analysis tool “ANCOM” (analysis of composition of microbiomes) was used. ANCOM transforms the count data into log-ratios and thus is more suitable for comparing the composition of microbiomes in two or more populations. "ANCOM" generates a table of features with W-statistics and whether the null hypothesis is rejected. The “W” is the W-statistic, or number of features that a single feature is tested to be significantly different against. Hence the higher the "W" the more statistical sifgnificant that a feature/species is differentially abundant.


References:

Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol. 2017 Nov 15;8:2224. doi: 10.3389/fmicb.2017.02224. PMID: 29187837; PMCID: PMC5695134.

Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis. 2015 May 29;26:27663. doi: 10.3402/mehd.v26.27663. PMID: 26028277; PMCID: PMC4450248.

Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nat Commun. 2020 Jul 14;11(1):3514. doi: 10.1038/s41467-020-17041-7. PMID: 32665548; PMCID: PMC7360769.

 
 

ANCOM Differential Abundance Analysis

 
ANCOM Results for Individual Comparisons
Comparison No.Comparison Name
Comparison 1.Saliva Preterm Mother, B(-) vs Saliva Preterm Mother, B(+) vs Saliva On-term Mother, B(-)
Comparison 2.Saliva Preterm Baby (+) vs Saliva Preterm Baby (-) vs Saliva On-term Baby (-)
Comparison 3.Stool Preterm Baby (+) vs Stool Preterm Baby (-) vs Stool On-term Baby (-)
 
 

ANCOM-BC2 Differential Abundance Analysis

 

Starting with version V1.2, we include the results of ANCOM-BC (Analysis of Compositions of Microbiomes with Bias Correction) (Lin and Peddada 2020). ANCOM-BC is an updated version of "ANCOM" that:
(a) provides statistically valid test with appropriate p-values,
(b) provides confidence intervals for differential abundance of each taxon,
(c) controls the False Discovery Rate (FDR),
(d) maintains adequate power, and
(e) is computationally simple to implement.

The bias correction (BC) addresses a challenging problem of the bias introduced by differences in the sampling fractions across samples. This bias has been a major hurdle in performing DA analysis of microbiome data. ANCOM-BC estimates the unknown sampling fractions and corrects the bias induced by their differences among samples. The absolute abundance data are modeled using a linear regression framework.

Starting with version V1.43, ANCOM-BC2 is used instead of ANCOM-BC, So that multiple pairwise directional test can be performed (if there are more than two gorups in a comparison). When performning pairwise directional test, the mixed directional false discover rate (mdFDR) is taken into account. The mdFDR is the combination of false discovery rate due to multiple testing, multiple pairwise comparisons, and directional tests within each pairwise comparison. The mdFDR is adopted from (Guo, Sarkar, and Peddada 2010; Grandhi, Guo, and Peddada 2016). For more detail explanation and additional features of ANCOM-BC2 please see author's documentation.

References:

Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nat Commun. 2020 Jul 14;11(1):3514. doi: 10.1038/s41467-020-17041-7. PMID: 32665548; PMCID: PMC7360769.

Guo W, Sarkar SK, Peddada SD. Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics. 2010 Jun;66(2):485-92. doi: 10.1111/j.1541-0420.2009.01292.x. Epub 2009 Jul 23. PMID: 19645703; PMCID: PMC2895927.

Grandhi A, Guo W, Peddada SD. A multiple testing procedure for multi-dimensional pairwise comparisons with application to gene expression studies. BMC Bioinformatics. 2016 Feb 25;17:104. doi: 10.1186/s12859-016-0937-5. PMID: 26917217; PMCID: PMC4768411.

 
 
ANCOM-BC Results for Individual Comparisons
 
Comparison No.Comparison Name
Comparison 2.Saliva Preterm Baby (+) vs Saliva Preterm Baby (-) vs Saliva On-term Baby (-)
Comparison 3.Stool Preterm Baby (+) vs Stool Preterm Baby (-) vs Stool On-term Baby (-)
 
 
 

LEfSe - Linear Discriminant Analysis Effect Size

LEfSe (Linear Discriminant Analysis Effect Size) is an alternative method to find "organisms, genes, or pathways that consistently explain the differences between two or more microbial communities" (Segata et al., 2011). Specifically, LEfSe uses rank-based Kruskal-Wallis (KW) sum-rank test to detect features with significant differential (relative) abundance with respect to the class of interest. Since it is rank-based, instead of proportional based, the differential species identified among the comparison groups is less biased (than percent abundance based).

Reference:

Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. Metagenomic biomarker discovery and explanation. Genome Biol. 2011 Jun 24;12(6):R60. doi: 10.1186/gb-2011-12-6-r60. PMID: 21702898; PMCID: PMC3218848.

 
Saliva Preterm Mother, B(-) vs Saliva Preterm Mother, B(+) vs Saliva On-term Mother, B(-)
 
 
 
 
 
 
 
LEfSe Results for All Comparisons
 
Comparison No.Comparison Name
Comparison 1.Saliva Preterm Mother, B(-) vs Saliva Preterm Mother, B(+) vs Saliva On-term Mother, B(-)
Comparison 2.Saliva Preterm Baby (+) vs Saliva Preterm Baby (-) vs Saliva On-term Baby (-)
Comparison 3.Stool Preterm Baby (+) vs Stool Preterm Baby (-) vs Stool On-term Baby (-)
 
 

XI. Analysis - Heatmap Profile

 

Species vs Sample Abundance Heatmap for All Samples

 
 
 

Heatmaps for Individual Comparisons

 
A) Two-way clustering - clustered on both columns (Samples) and rows (organism)
Comparison No.Comparison NameFamily LevelGenus LevelSpecies Level
Comparison 2Saliva Preterm Baby (+) vs Saliva Preterm Baby (-) vs Saliva On-term Baby (-)PDFSVGPDFSVGPDFSVG
Comparison 3Stool Preterm Baby (+) vs Stool Preterm Baby (-) vs Stool On-term Baby (-)PDFSVGPDFSVGPDFSVG
 
 
B) One-way clustering - clustered on rows (organism) only
Comparison No.Comparison NameFamily LevelGenus LevelSpecies Level
Comparison 2Saliva Preterm Baby (+) vs Saliva Preterm Baby (-) vs Saliva On-term Baby (-)PDFSVGPDFSVGPDFSVG
Comparison 3Stool Preterm Baby (+) vs Stool Preterm Baby (-) vs Stool On-term Baby (-)PDFSVGPDFSVGPDFSVG
 
 
C) No clustering
Comparison No.Comparison NameFamily LevelGenus LevelSpecies Level
Comparison 2Saliva Preterm Baby (+) vs Saliva Preterm Baby (-) vs Saliva On-term Baby (-)PDFSVGPDFSVGPDFSVG
Comparison 3Stool Preterm Baby (+) vs Stool Preterm Baby (-) vs Stool On-term Baby (-)PDFSVGPDFSVGPDFSVG
 
 

XII. Analysis - Network Association

To analyze the co-occurrence or co-exclusion between microbial species among different samples, network correlation analysis tools are usually used for this purpose. However, microbiome count data are compositional. If count data are normalized to the total number of counts in the sample, the data become not independent and traditional statistical metrics (e.g., correlation) for the detection of specie-species relationships can lead to spurious results. In addition, sequencing-based studies typically measure hundreds of OTUs (species) on few samples; thus, inference of OTU-OTU association networks is severely under-powered. Here we use SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues (Kurtz et al., 2015). SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. SPIEC-EASI provides two algorithms for network inferencing – 1) Meinshausen-Bühlmann's neighborhood selection (MB method) and inverse covariance selection (GLASSO method, i.e., graphical least absolute shrinkage and selection operator). This is fundamentally distinct from SparCC, which essentially estimate pairwise correlations. In addition to these two methods, we provide the results of a third method - SparCC (Sparse Correlations for Compositional Data)(Friedman & Alm 2012), which is also a method for inferring correlations from compositional data. SparCC estimates the linear Pearson correlations between the log-transformed components.


References:

Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol. 2015 May 7;11(5):e1004226. doi: 10.1371/journal.pcbi.1004226. PMID: 25950956; PMCID: PMC4423992.

Friedman J, Alm EJ. Inferring correlation networks from genomic survey data. PLoS Comput Biol. 2012;8(9):e1002687. doi: 10.1371/journal.pcbi.1002687. Epub 2012 Sep 20. PMID: 23028285; PMCID: PMC3447976.

 

SPIEC-EASI Network Inference by Neighborhood Selection (MB Method)

 

 

 

Association Network Inference by SparCC

 

 

 
 

XIII. Disclaimer

The results of this analysis are for research purpose only. They are not intended to diagnose, treat, cure, or prevent any disease. Forsyth and FOMC are not responsible for use of information provided in this report outside the research area.

 

Copyright FOMC 2023