Metagenomics Analysis

16S rRNA Amplicon Sequencing Analysis Service — ASV-Resolved Taxonomic Profiling from Raw FASTQs to Differential-Abundance Tables

16S rRNA amplicon sequencing profiles bacterial and archaeal community composition from hypervariable-region PCR amplicons (Callahan et al., 2016). Pepkio delivers version-pinned DADA2 ASV workflows—taxonomy, diversity, and differential abundance—with code, figures, and a Methods draft for academic, biotech, and pharma clients. Custom analyses are scoped at kickoff. High-coverage 16S runs exceeded 300,000 reads per paired stool and colon sample (Mas-Lloret et al., 2020).

Key facts

Key facts about 16S Amplicon
Fact	Value
Supported platforms / instruments	Illumina MiSeq / NextSeq / NovaSeq (paired- and single-end FASTQ); MGI DNBSEQ-G400 and related DNBSEQ instruments (2×150 or 2×200 PE FASTQ; Anslan et al., 2021); Ion Torrent 16S (`denoise-pyro`) and PacBio CCS amplicons (`denoise-css`) when scoped at kickoff
Input requirements	Demultiplexed FASTQ + sample metadata; primer and hypervariable region documented; high-biomass stool/soil typically ≥30,000–50,000 PE reads (rarefaction depth set from alpha-rarefaction curves at kickoff); low-biomass swabs/biopsies often need higher depth—confirmed at kickoff; standard cohort ≤24 samples, one contrast
Reference builds supported	SILVA 138 (16S/18S, default); Greengenes2 2022.10 when primers match classifier training set; UNITE 9.0 for ITS-only projects scoped separately
Primary tools (with versions)	QIIME2 2024.10; DADA2 (q2-dada2) 2024.10.0; q2-cutadapt; fastp 0.24.0; FastQC 0.12.1; MultiQC 1.25.1; phyloseq 1.48.0; vegan 2.6-8.1; ANCOM-BC 2.4.0; MaAsLin2 1.18.0; PICRUSt2 2.5.2 (on request)
Typical turnaround time	3–5 weeks (standard ≤24 samples, one contrast); 5–8 weeks (multi-run merges, multi-contrast designs, or bespoke extensions) — confirmed at kickoff
Deliverable formats	ASV abundance tables (`.csv`, `.biom`, `.qza` on request); taxonomy tables; diversity and differential-abundance results; PDF/SVG figures; HTML MultiQC report; commented R/Python scripts; Methods draft
Key cited best-practice reference	Callahan et al. (2016), Nature Methods; Bokulich et al. (2018), Microbiome
Custom / bespoke analysis	Non-standard inputs, outputs, and methods scoped at kickoff—e.g., PICRUSt2 functional inference, pre-built ASV table re-analysis, client-specified filters, co-occurrence or network extensions, custom reference databases, or alternative differential-abundance models

What is 16S rRNA amplicon sequencing?

16S rRNA amplicon sequencing infers amplicon sequence variants (ASVs) from demultiplexed reads, assigns taxonomy against curated reference databases, and applies compositional statistics—not library prep or PCR itself. DADA2 resolves variants differing by one nucleotide (Callahan et al., 2016). Unlike shotgun metagenomics, 16S targets one marker gene and typically resolves taxa to genus level with short-read V4 amplicons (Johnson et al., 2019); functional profiling requires inference tools such as PICRUSt2 when scoped. The Human Microbiome Compendium reprocessed 168,464 public gut 16S samples (Abdill et al., 2025). Pepkio starts from FASTQs or ASV tables and returns version-pinned outputs. See the 16S amplicon sequencing glossary.

When should you use 16S rRNA amplicon sequencing?

16S amplicon sequencing fits cost-sensitive cohort studies where genus-level taxonomic profiling and alpha/beta diversity are the primary endpoints.

Comparison of 16S amplicon, shotgun metagenomics, and metatranscriptomics
Approach	Best for	Limitations	Approximate cost range
16S amplicon (ASV)	Large cohorts, longitudinal stool/soil studies, genus-level community profiling	Poor species/strain resolution with short-read HV amplicons (Johnson et al., 2019); primer bias; no direct pathway quantification without inference	Lowest sequencing + bioinformatics cost per sample
Shotgun metagenomics	Species/strain resolution, functional pathways, MAG recovery	Higher per-sample cost; host DNA and low biomass reduce effective microbial depth (Hillmann et al., 2018)	Higher than 16S; shallow shotgun (~0.5 M reads) can recover species-level profiles at similar per-sample cost to 16S (Hillmann et al., 2018)
Metatranscriptomics	Active gene expression in mixed communities	RNA instability; rRNA depletion required; highest analytical complexity	Highest complexity and cost

IBD flare dynamics: Lloyd-Price et al. (2019) followed 132 longitudinal IBD subjects with 16S, metagenomics, and metabolomics, reporting increased facultative anaerobes during active disease.
Global gut geography: Abdill et al. (2025) integrated 168,464 public gut 16S samples and found composition varied by world region and associated with primer choice and DNA extraction kit.
Medication–microbiome interactions: Kumar et al. (2025) linked prescription drug exposure to gut microbiome shifts and enteric infection risk in population and mouse model studies.

How the analysis works — step by step

1. Validate inputs and sample metadata
Pepkio verifies FASTQ integrity, read layout, and records sample IDs, condition, batch, sequencing run, primers, and hypervariable region in `sample_manifest.csv`. Missing primer metadata or mismatched read structure is flagged before import.
Tools and outputs
Tools used: Custom validation scripts; MD5 checksum verification
Output: sample_manifest.csv with sample IDs, platform, read counts, primer/HV region, and QC flags
2. QC and trim raw reads
Adapter contamination, low-quality tails, and overrepresented sequences are assessed before denoising (Chen et al., 2018; Ewels et al., 2016). Per-sample read counts and quality distributions are aggregated for review.
Tools and outputs
Tools used: FastQC 0.12.1; fastp 0.24.0; MultiQC 1.25.1
Output: Per-sample FastQC/fastp reports; multiqc_report.html
3. Import demultiplexed reads into QIIME2
Reads import via sample manifest. Pepkio confirms Phred+33 encoding and records per-sample read counts in the demultiplex summary (Bolyen et al., 2019).
Tools and outputs
Tools used: QIIME2 2024.10 qiime tools import
Output: demux.qza; demux-summary.qzv with per-sample read counts and quality score boxplots
4. Trim PCR primers
Client-supplied primers are removed with q2-cutadapt before DADA2. Untrimmed primers inflate chimera rates (Bokulich et al., 2018).
Tools and outputs
Tools used: QIIME2 2024.10 q2-cutadapt
Output: trimmed-demux.qza; primer-trim statistics table
5. Denoise to ASVs per sequencing run
DADA2 runs separately on each sequencing run because error profiles differ by instrument and flowcell (Callahan et al., 2016; Anslan et al., 2021). Truncation lengths (`p-trunc-len-f/r`) are set from quality plots; paired-end merging requires sufficient overlap between reads. Low read retention triggers parameter review before proceeding.
Tools and outputs
Tools used: QIIME2 2024.10 q2-dada2 (denoise-paired, denoise-single, denoise-pyro, or denoise-css when scoped)
Output: feature-table.qza; rep-seqs.qza; dada2-stats.qza with input, filtered, denoised, merged, and non-chimeric read counts per sample
6. Merge runs and filter contaminants
Feature tables and representative sequences from separate runs merge into a single cohort table. Mitochondrial, chloroplast, and unassigned features are removed; low-frequency features below the documented prevalence threshold are filtered (Nikodemova et al., 2023).
Tools and outputs
Tools used: QIIME2 2024.10 feature-table merge; filter-features; custom filtering scripts
Output: table-filtered.qza; rep-seqs-filtered.qza; asv_filter_log.csv documenting removed features and thresholds
7. Assign taxonomy
ASVs classify against a Naive Bayes classifier trained on SILVA 138 (default) or Greengenes2 2022.10 when primers match the training amplicon (Bokulich et al., 2018). Classification confidence scores are retained for downstream filtering.
Tools and outputs
Tools used: QIIME2 2024.10 classify-sklearn
Output: taxonomy.qza; taxonomy.tsv with Feature ID, Taxon, and Confidence
8. Build phylogeny and compute diversity
Representative sequences are aligned and a phylogenetic tree is built before alpha diversity (Shannon, Faith phylogenetic diversity) and beta diversity (weighted and unweighted UniFrac) are computed on rarefied or normalized tables as documented in the Methods draft. PERMANOVA tests group differences with covariates recorded when provided (Anderson, 2001; McMurdie & Holmes, 2013).
Tools and outputs
Tools used: QIIME2 2024.10 phylogeny align-to-tree-mafft-fasttree; q2-diversity; phyloseq 1.48.0; vegan 2.6-8.1
Output: rooted-tree.qza; diversity_results/ (alpha and beta metrics); permanova_results.csv; rarefaction curves
9. Test differential abundance
ANCOM-BC is the default compositional differential-abundance method (Lin & Peddada, 2020). MaAsLin2 fits multivariable models with continuous and categorical covariates when scoped (Mallick et al., 2021). Contrasts, reference levels, and covariates are confirmed at kickoff.
Tools and outputs
Tools used: ANCOM-BC 2.4.0; MaAsLin2 1.18.0 (when scoped)
Output: da_results_<contrast>.csv with taxon, effect size, p-value, q-value, and contrast metadata
10. Package deliverables
ASV tables, taxonomy, diversity results, figures, scripts, README, Methods draft, and HTML QC report are assembled per agreed retention policy.
Tools and outputs
Tools used: MultiQC 1.25.1; custom export and plotting scripts
Output: Deliverable bundle with processed tables, figures, code, QC report, and Methods draft

What Pepkio delivers

Processed data files

asv_table.csv; asv_taxonomy.tsv; sample_metadata_merged.csv; dada2_denoising_stats.csv
diversity_alpha.csv; diversity_beta_distance_matrix.csv; permanova_results.csv; da_results_<contrast>.csv
.biom and .qza on request

Figures (PDF/SVG)

Quality boxplots; DADA2 read-retention bar chart; rarefaction curves
Phylum/genus stacked barplots; alpha-diversity boxplots by group
PCoA or NMDS ordination; differential-abundance effect-size or volcano plots

Tables

SampleID, ASV feature IDs, Kingdom–Genus, Confidence
DADA2 stage counts (input, filtered, denoised, merged, non-chimeric)
DA statistics (lfc or coef, p_val, q_val, contrast)

Code

Commented QIIME2 CLI export scripts plus R (phyloseq, ANCOM-BC) or Python analysis scripts
Conda lockfile or sessionInfo() export; delivery via private Git repository or agreed file transfer

Documentation

HTML/PDF QC report; README with reproduction steps
Methods draft listing tool versions, reference database builds, primer-trimming parameters, DADA2 truncation settings, and filtering thresholds
Post-delivery reviewer support within agreed scope (typically ≤20% of deliverables)

Technical decisions we make — and why

ASVs via DADA2, not 97% OTU clustering: DADA2 resolves single-nucleotide variants; mock-community benchmarks recovered more true variants and fewer spurious sequences than OTU picking (Callahan et al., 2016). Legacy OTU output is scoped on request.
Separate DADA2 per sequencing run, merge downstream: Pooling reads before denoising violates DADA2 error-model assumptions (Callahan et al., 2016). MGI and Illumina runs are denoised independently and merged at the feature-table stage (Anslan et al., 2021).
Default classifier: SILVA 138 Naive Bayes (classify-sklearn): Greengenes2 2022.10 is used when client primers match the classifier training amplicon (e.g., 515F/806R V4). Classifier choice and confidence thresholds are documented because mis-assignment rates vary by hypervariable region (Bokulich et al., 2018; Johnson et al., 2019).
Default differential abundance: ANCOM-BC: Microbiome count data are compositional; ANCOM-BC accounts for sampling fraction and structural zeros (Lin & Peddada, 2020). MaAsLin2 is scoped for multivariable models with multiple covariates (Mallick et al., 2021).
Low-abundance filtering: per-sample copy threshold or cohort-level prevalence: Global filters removing features below 0.1% dataset abundance discard reproducible rare taxa; per-sample copy thresholds (e.g., ≥10 reads) improve replicate reliability while retaining more low-abundance signal (Nikodemova et al., 2023). Exact thresholds are documented in asv_filter_log.csv.

Common questions

What is the minimum sequencing depth and sample count for 16S amplicon analysis?

For high-biomass samples (stool, soil), Pepkio typically recommends ≥30,000–50,000 paired-end reads per sample, with rarefaction depth set from alpha-rarefaction curves at kickoff. Low-biomass swabs or biopsies often need higher depth. Standard projects cover ≤24 samples with one primary contrast; larger cohorts are scoped at kickoff.

Can you analyze poor-quality or low-yield 16S libraries?

Yes, with documented caveats. Samples with low read counts after DADA2 are flagged in dada2_denoising_stats.csv and may lack power for rare taxa or subgroup comparisons. Pepkio proceeds when clients accept reduced sensitivity; re-sequencing is discussed when input reads are very low for the matrix type.

Do you support Illumina, MGI DNBSEQ, Ion Torrent, and PacBio 16S data?

Yes for Illumina MiSeq, NextSeq, and NovaSeq paired- and single-end FASTQs. MGI DNBSEQ FASTQs (e.g., G400; 2×150 or 2×200) are supported with DADA2 parameters tuned per read length (Anslan et al., 2021). Ion Torrent data uses denoise-pyro; PacBio CCS amplicons use denoise-css when scoped at kickoff.

How long does a 16S amplicon analysis project take at Pepkio?

Standard projects (≤24 samples, one contrast, single or few sequencing runs) typically complete in 3–5 weeks. Multi-run merges, multi-contrast designs, PICRUSt2 extensions, or cohorts >24 samples may require 5–8 weeks—all confirmed at kickoff.

How do you handle batch effects across sequencing runs or extraction kits?

Sequencing run, extraction kit, and primer choice are recorded in metadata and QC reports. DADA2 runs per sequencing run; PERMANOVA and MaAsLin2 can include batch as a covariate when scoped. Abdill et al. (2025) showed primer and extraction kit associate with compositional variation—Pepkio documents these confounders rather than over-interpreting batch-driven shifts.

Do I own the code — and in what format is it delivered?

Yes — you retain full ownership of code, scripts, and results. Pepkio delivers commented QIIME2 export scripts plus R or Python analysis code with conda lockfiles. Jupyter or R Markdown notebooks are available on request.

Can I be involved during analysis?

Yes. Checkpoint reviews occur after raw QC, after DADA2 denoising (before differential testing), and before final delivery. You can review metadata, covariates, filtering thresholds, and contrasts within agreed scope. A PhD-level bioinformatician serves as your primary contact at each milestone.

What does post-delivery reviewer support include?

Clarification of methods, DADA2 parameters, filtering logic, and minor figure or table revisions within agreed scope (typically ≤20% of deliverables). Substantial new analyses—additional contrasts, re-running with alternate databases, or PICRUSt2 extensions—are scoped as separate milestones.

Is co-authorship required?

No. Pepkio operates as a fee-for-service provider and does not require co-authorship unless explicitly discussed in advance.

Should we use V4 or V3–V4 primers for our study?

Primer choice affects taxonomic coverage and resolution. Johnson et al. (2019) showed short-read hypervariable-region amplicons cannot match full-length 16S species-level resolution; V4 (515F/806R) is widely used in Earth Microbiome Project protocols (Thompson et al., 2017). V3–V4 amplicons (~460 bp) offer broader coverage but require sufficient paired-end overlap—confirm read length against amplicon size at kickoff.

Why does Pepkio default to ASVs instead of 97% OTU clustering?

ASVs represent exact sequences and avoid arbitrary clustering thresholds that merge distinct variants (Callahan et al., 2016). Denoising removes spurious sequences more effectively than distance-based OTU picking in mock-community benchmarks. Legacy OTU output for method comparison is scoped on request.

Can you run PICRUSt2 or other custom non-standard 16S analyses?

Yes, when scoped at kickoff. PICRUSt2 functional inference, co-occurrence or network analysis, pre-built ASV table re-analysis with alternate databases, client-specified differential-abundance models, and custom reference classifiers are agreed before analysis begins.

Related services

Shotgun metagenomics — Species- and strain-level taxonomic profiling plus direct functional pathway quantification when 16S resolution is insufficient.
Metatranscriptomics — Active gene expression profiling to complement taxonomic snapshots from 16S amplicons.
Metagenomics analysis services — Hub page comparing 16S, shotgun, and metatranscriptomic entry points with shared QC standards.
Custom consulting — Amplicon-vs-shotgun feasibility review and primer selection before committing to sequencing.

References

Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: high-resolution sample inference from Illumina amplicon data. Nature Methods. 2016;13(7):581–583. https://doi.org/10.1038/nmeth.3869 (PMID: 27214047)
Bokulich NA, Kaehler BD, Rideout JR, Dillon M, Bolyen E, Knight R, Huttley GA, Caporaso JG. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin. Microbiome. 2018;6:90. https://doi.org/10.1186/s40168-018-0470-z (PMID: 29773078)
Bolyen E, Rideout JR, Dillon MR, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology. 2019;37(8):852–857. https://doi.org/10.1038/s41587-019-0209-9 (PMID: 31341288)
Johnson JS, Spakowicz DJ, Hong BY, et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nature Communications. 2019;10:5029. https://doi.org/10.1038/s41467-019-13036-1 (PMID: 31695033)
McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE. 2013;8(4):e61217. https://doi.org/10.1371/journal.pone.0061217 (PMID: 23630581)
Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nature Communications. 2020;11:3514. https://doi.org/10.1038/s41467-020-17041-7 (PMID: 32665548)
Mallick H, Rahnavard A, McIver LJ, et al. Multivariable association discovery in population-scale meta-omics studies. PLoS Computational Biology. 2021;17(11):e1009442. https://doi.org/10.1371/journal.pcbi.1009442 (PMID: 34784344)
Mas-Lloret J, Obón-Santacana M, Ibáñez-Sanz G, et al. Gut microbiome diversity detected by high-coverage 16S and shotgun sequencing of paired stool and colon sample. Scientific Data. 2020;7:92. https://doi.org/10.1038/s41597-020-0427-5 (PMID: 32179734)
Lloyd-Price J, Arze C, Ananthakrishnan AN, et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature. 2019;569(7758):655–662. https://doi.org/10.1038/s41586-019-1237-9 (PMID: 31142855)
Thompson LR, Sanders JG, McDonald D, et al. A communal catalogue reveals Earth's multiscale microbial diversity. Nature. 2017;551(7681):457–463. https://doi.org/10.1038/nature24621 (PMID: 29088705)
Anslan S, Mikryukov V, Armolaitis K, et al. Highly comparable metabarcoding results from MGI-Tech and Illumina sequencing platforms. PeerJ. 2021;9:e12254. https://doi.org/10.7717/peerj.12254 (PMID: 34703674)
Nikodemova M, Holzhausen EA, Deblois CL, et al. The effect of low-abundance OTU filtering methods on the reliability and variability of microbial composition assessed by 16S rRNA amplicon sequencing. Frontiers in Cellular and Infection Microbiology. 2023;13:1165295. https://doi.org/10.3389/fcimb.2023.1165295 (PMID: 37377642)
Abdill RJ, Graham SP, Rubinetti V, Ahmadian M, Hicks P, Chetty A, McDonald D, Ferretti P, Gibbons E, Rossi M, Krishnan A, Albert FW, Greene CS, Davis S, Blekhman R. Integration of 168,000 samples reveals global patterns of the human gut microbiome. Cell. 2025;188(4):1100–1118.e17. https://doi.org/10.1016/j.cell.2024.12.017 (PMID: 39848248)
Kumar A, Sun R, Habib B, et al. Identification of medication–microbiome interactions that affect gut infection. Nature. 2025;644:506–515. https://doi.org/10.1038/s41586-025-09273-8 (PMID: 40670788)
Hillmann B, Al-Ghalith GA, Shields-Cutler RR, et al. Evaluating the information content of shallow shotgun metagenomics. mSystems. 2018;3(6):e00069-18. https://doi.org/10.1128/mSystems.00069-18 (PMID: 30443602)
Anderson MJ. A new method for non-parametric multivariate analysis of variance. Austral Ecology. 2001;26(1):32–46. https://doi.org/10.1111/j.1442-9993.2001.01070.x
Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–3048. https://doi.org/10.1093/bioinformatics/btw354 (PMID: 27312411)
Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. https://doi.org/10.1093/bioinformatics/bty624 (PMID: 30423086)

Let's Talk About Your Science

Tell us:

• Your biological question
• Data type and size
• Timeline constraints

We'll tell you:

• What's feasible
• How long it will take
• Exactly what it will cost

16S rRNA Amplicon Sequencing Analysis Service — ASV-Resolved Taxonomic Profiling from Raw FASTQs to Differential-Abundance Tables

Key facts

What is 16S rRNA amplicon sequencing?

When should you use 16S rRNA amplicon sequencing?

How the analysis works — step by step

1. Validate inputs and sample metadata

2. QC and trim raw reads

3. Import demultiplexed reads into QIIME2

4. Trim PCR primers

5. Denoise to ASVs per sequencing run

6. Merge runs and filter contaminants

7. Assign taxonomy

8. Build phylogeny and compute diversity

9. Test differential abundance

10. Package deliverables