Metagenomics Analysis

16S rRNA Amplicon Sequencing Analysis Service — ASV-Resolved Taxonomic Profiling from Raw FASTQs to Differential-Abundance Tables

16S rRNA amplicon sequencing profiles bacterial and archaeal community composition from hypervariable-region PCR amplicons (Callahan et al., 2016). Pepkio delivers version-pinned DADA2 ASV workflows—taxonomy, diversity, and differential abundance—with code, figures, and a Methods draft for academic, biotech, and pharma clients. Custom analyses are scoped at kickoff. High-coverage 16S runs exceeded 300,000 reads per paired stool and colon sample (Mas-Lloret et al., 2020).

Key facts

Key facts about 16S Amplicon
FactValue
Supported platforms / instrumentsIllumina MiSeq / NextSeq / NovaSeq (paired- and single-end FASTQ); MGI DNBSEQ-G400 and related DNBSEQ instruments (2×150 or 2×200 PE FASTQ; Anslan et al., 2021); Ion Torrent 16S (`denoise-pyro`) and PacBio CCS amplicons (`denoise-css`) when scoped at kickoff
Input requirementsDemultiplexed FASTQ + sample metadata; primer and hypervariable region documented; high-biomass stool/soil typically ≥30,000–50,000 PE reads (rarefaction depth set from alpha-rarefaction curves at kickoff); low-biomass swabs/biopsies often need higher depth—confirmed at kickoff; standard cohort ≤24 samples, one contrast
Reference builds supportedSILVA 138 (16S/18S, default); Greengenes2 2022.10 when primers match classifier training set; UNITE 9.0 for ITS-only projects scoped separately
Primary tools (with versions)QIIME2 2024.10; DADA2 (q2-dada2) 2024.10.0; q2-cutadapt; fastp 0.24.0; FastQC 0.12.1; MultiQC 1.25.1; phyloseq 1.48.0; vegan 2.6-8.1; ANCOM-BC 2.4.0; MaAsLin2 1.18.0; PICRUSt2 2.5.2 (on request)
Typical turnaround time3–5 weeks (standard ≤24 samples, one contrast); 5–8 weeks (multi-run merges, multi-contrast designs, or bespoke extensions) — confirmed at kickoff
Deliverable formatsASV abundance tables (`.csv`, `.biom`, `.qza` on request); taxonomy tables; diversity and differential-abundance results; PDF/SVG figures; HTML MultiQC report; commented R/Python scripts; Methods draft
Key cited best-practice referenceCallahan et al. (2016), Nature Methods; Bokulich et al. (2018), Microbiome
Custom / bespoke analysisNon-standard inputs, outputs, and methods scoped at kickoff—e.g., PICRUSt2 functional inference, pre-built ASV table re-analysis, client-specified filters, co-occurrence or network extensions, custom reference databases, or alternative differential-abundance models

What is 16S rRNA amplicon sequencing?

16S rRNA amplicon sequencing infers amplicon sequence variants (ASVs) from demultiplexed reads, assigns taxonomy against curated reference databases, and applies compositional statistics—not library prep or PCR itself. DADA2 resolves variants differing by one nucleotide (Callahan et al., 2016). Unlike shotgun metagenomics, 16S targets one marker gene and typically resolves taxa to genus level with short-read V4 amplicons (Johnson et al., 2019); functional profiling requires inference tools such as PICRUSt2 when scoped. The Human Microbiome Compendium reprocessed 168,464 public gut 16S samples (Abdill et al., 2025). Pepkio starts from FASTQs or ASV tables and returns version-pinned outputs. See the 16S amplicon sequencing glossary.

When should you use 16S rRNA amplicon sequencing?

16S amplicon sequencing fits cost-sensitive cohort studies where genus-level taxonomic profiling and alpha/beta diversity are the primary endpoints.

Comparison of 16S amplicon, shotgun metagenomics, and metatranscriptomics
ApproachBest forLimitationsApproximate cost range
16S amplicon (ASV)Large cohorts, longitudinal stool/soil studies, genus-level community profilingPoor species/strain resolution with short-read HV amplicons (Johnson et al., 2019); primer bias; no direct pathway quantification without inferenceLowest sequencing + bioinformatics cost per sample
Shotgun metagenomicsSpecies/strain resolution, functional pathways, MAG recoveryHigher per-sample cost; host DNA and low biomass reduce effective microbial depth (Hillmann et al., 2018)Higher than 16S; shallow shotgun (~0.5 M reads) can recover species-level profiles at similar per-sample cost to 16S (Hillmann et al., 2018)
MetatranscriptomicsActive gene expression in mixed communitiesRNA instability; rRNA depletion required; highest analytical complexityHighest complexity and cost
  • IBD flare dynamics: Lloyd-Price et al. (2019) followed 132 longitudinal IBD subjects with 16S, metagenomics, and metabolomics, reporting increased facultative anaerobes during active disease.
  • Global gut geography: Abdill et al. (2025) integrated 168,464 public gut 16S samples and found composition varied by world region and associated with primer choice and DNA extraction kit.
  • Medication–microbiome interactions: Kumar et al. (2025) linked prescription drug exposure to gut microbiome shifts and enteric infection risk in population and mouse model studies.

How the analysis works — step by step

  1. 1. Validate inputs and sample metadata

    Pepkio verifies FASTQ integrity, read layout, and records sample IDs, condition, batch, sequencing run, primers, and hypervariable region in `sample_manifest.csv`. Missing primer metadata or mismatched read structure is flagged before import.

    Tools and outputs

    Tools used: Custom validation scripts; MD5 checksum verification

    Output: sample_manifest.csv with sample IDs, platform, read counts, primer/HV region, and QC flags

  2. 2. QC and trim raw reads

    Adapter contamination, low-quality tails, and overrepresented sequences are assessed before denoising (Chen et al., 2018; Ewels et al., 2016). Per-sample read counts and quality distributions are aggregated for review.

    Tools and outputs

    Tools used: FastQC 0.12.1; fastp 0.24.0; MultiQC 1.25.1

    Output: Per-sample FastQC/fastp reports; multiqc_report.html

  3. 3. Import demultiplexed reads into QIIME2

    Reads import via sample manifest. Pepkio confirms Phred+33 encoding and records per-sample read counts in the demultiplex summary (Bolyen et al., 2019).

    Tools and outputs

    Tools used: QIIME2 2024.10 qiime tools import

    Output: demux.qza; demux-summary.qzv with per-sample read counts and quality score boxplots

  4. 4. Trim PCR primers

    Client-supplied primers are removed with q2-cutadapt before DADA2. Untrimmed primers inflate chimera rates (Bokulich et al., 2018).

    Tools and outputs

    Tools used: QIIME2 2024.10 q2-cutadapt

    Output: trimmed-demux.qza; primer-trim statistics table

  5. 5. Denoise to ASVs per sequencing run

    DADA2 runs separately on each sequencing run because error profiles differ by instrument and flowcell (Callahan et al., 2016; Anslan et al., 2021). Truncation lengths (`p-trunc-len-f/r`) are set from quality plots; paired-end merging requires sufficient overlap between reads. Low read retention triggers parameter review before proceeding.

    Tools and outputs

    Tools used: QIIME2 2024.10 q2-dada2 (denoise-paired, denoise-single, denoise-pyro, or denoise-css when scoped)

    Output: feature-table.qza; rep-seqs.qza; dada2-stats.qza with input, filtered, denoised, merged, and non-chimeric read counts per sample

  6. 6. Merge runs and filter contaminants

    Feature tables and representative sequences from separate runs merge into a single cohort table. Mitochondrial, chloroplast, and unassigned features are removed; low-frequency features below the documented prevalence threshold are filtered (Nikodemova et al., 2023).

    Tools and outputs

    Tools used: QIIME2 2024.10 feature-table merge; filter-features; custom filtering scripts

    Output: table-filtered.qza; rep-seqs-filtered.qza; asv_filter_log.csv documenting removed features and thresholds

  7. 7. Assign taxonomy

    ASVs classify against a Naive Bayes classifier trained on SILVA 138 (default) or Greengenes2 2022.10 when primers match the training amplicon (Bokulich et al., 2018). Classification confidence scores are retained for downstream filtering.

    Tools and outputs

    Tools used: QIIME2 2024.10 classify-sklearn

    Output: taxonomy.qza; taxonomy.tsv with Feature ID, Taxon, and Confidence

  8. 8. Build phylogeny and compute diversity

    Representative sequences are aligned and a phylogenetic tree is built before alpha diversity (Shannon, Faith phylogenetic diversity) and beta diversity (weighted and unweighted UniFrac) are computed on rarefied or normalized tables as documented in the Methods draft. PERMANOVA tests group differences with covariates recorded when provided (Anderson, 2001; McMurdie & Holmes, 2013).

    Tools and outputs

    Tools used: QIIME2 2024.10 phylogeny align-to-tree-mafft-fasttree; q2-diversity; phyloseq 1.48.0; vegan 2.6-8.1

    Output: rooted-tree.qza; diversity_results/ (alpha and beta metrics); permanova_results.csv; rarefaction curves

  9. 9. Test differential abundance

    ANCOM-BC is the default compositional differential-abundance method (Lin & Peddada, 2020). MaAsLin2 fits multivariable models with continuous and categorical covariates when scoped (Mallick et al., 2021). Contrasts, reference levels, and covariates are confirmed at kickoff.

    Tools and outputs

    Tools used: ANCOM-BC 2.4.0; MaAsLin2 1.18.0 (when scoped)

    Output: da_results_<contrast>.csv with taxon, effect size, p-value, q-value, and contrast metadata

  10. 10. Package deliverables

    ASV tables, taxonomy, diversity results, figures, scripts, README, Methods draft, and HTML QC report are assembled per agreed retention policy.

    Tools and outputs

    Tools used: MultiQC 1.25.1; custom export and plotting scripts

    Output: Deliverable bundle with processed tables, figures, code, QC report, and Methods draft

What Pepkio delivers

Processed data files

  • asv_table.csv; asv_taxonomy.tsv; sample_metadata_merged.csv; dada2_denoising_stats.csv
  • diversity_alpha.csv; diversity_beta_distance_matrix.csv; permanova_results.csv; da_results_<contrast>.csv
  • .biom and .qza on request

Figures (PDF/SVG)

  • Quality boxplots; DADA2 read-retention bar chart; rarefaction curves
  • Phylum/genus stacked barplots; alpha-diversity boxplots by group
  • PCoA or NMDS ordination; differential-abundance effect-size or volcano plots

Tables

  • SampleID, ASV feature IDs, Kingdom–Genus, Confidence
  • DADA2 stage counts (input, filtered, denoised, merged, non-chimeric)
  • DA statistics (lfc or coef, p_val, q_val, contrast)

Code

  • Commented QIIME2 CLI export scripts plus R (phyloseq, ANCOM-BC) or Python analysis scripts
  • Conda lockfile or sessionInfo() export; delivery via private Git repository or agreed file transfer

Documentation

  • HTML/PDF QC report; README with reproduction steps
  • Methods draft listing tool versions, reference database builds, primer-trimming parameters, DADA2 truncation settings, and filtering thresholds
  • Post-delivery reviewer support within agreed scope (typically ≤20% of deliverables)

Technical decisions we make — and why

ASVs via DADA2, not 97% OTU clustering
DADA2 resolves single-nucleotide variants; mock-community benchmarks recovered more true variants and fewer spurious sequences than OTU picking (Callahan et al., 2016). Legacy OTU output is scoped on request.
Separate DADA2 per sequencing run, merge downstream
Pooling reads before denoising violates DADA2 error-model assumptions (Callahan et al., 2016). MGI and Illumina runs are denoised independently and merged at the feature-table stage (Anslan et al., 2021).
Default classifier: SILVA 138 Naive Bayes (classify-sklearn)
Greengenes2 2022.10 is used when client primers match the classifier training amplicon (e.g., 515F/806R V4). Classifier choice and confidence thresholds are documented because mis-assignment rates vary by hypervariable region (Bokulich et al., 2018; Johnson et al., 2019).
Default differential abundance: ANCOM-BC
Microbiome count data are compositional; ANCOM-BC accounts for sampling fraction and structural zeros (Lin & Peddada, 2020). MaAsLin2 is scoped for multivariable models with multiple covariates (Mallick et al., 2021).
Low-abundance filtering: per-sample copy threshold or cohort-level prevalence
Global filters removing features below 0.1% dataset abundance discard reproducible rare taxa; per-sample copy thresholds (e.g., ≥10 reads) improve replicate reliability while retaining more low-abundance signal (Nikodemova et al., 2023). Exact thresholds are documented in asv_filter_log.csv.

Common questions

What is the minimum sequencing depth and sample count for 16S amplicon analysis?

For high-biomass samples (stool, soil), Pepkio typically recommends ≥30,000–50,000 paired-end reads per sample, with rarefaction depth set from alpha-rarefaction curves at kickoff. Low-biomass swabs or biopsies often need higher depth. Standard projects cover ≤24 samples with one primary contrast; larger cohorts are scoped at kickoff.

Can you analyze poor-quality or low-yield 16S libraries?

Yes, with documented caveats. Samples with low read counts after DADA2 are flagged in dada2_denoising_stats.csv and may lack power for rare taxa or subgroup comparisons. Pepkio proceeds when clients accept reduced sensitivity; re-sequencing is discussed when input reads are very low for the matrix type.

Do you support Illumina, MGI DNBSEQ, Ion Torrent, and PacBio 16S data?

Yes for Illumina MiSeq, NextSeq, and NovaSeq paired- and single-end FASTQs. MGI DNBSEQ FASTQs (e.g., G400; 2×150 or 2×200) are supported with DADA2 parameters tuned per read length (Anslan et al., 2021). Ion Torrent data uses denoise-pyro; PacBio CCS amplicons use denoise-css when scoped at kickoff.

How long does a 16S amplicon analysis project take at Pepkio?

Standard projects (≤24 samples, one contrast, single or few sequencing runs) typically complete in 3–5 weeks. Multi-run merges, multi-contrast designs, PICRUSt2 extensions, or cohorts >24 samples may require 5–8 weeks—all confirmed at kickoff.

How do you handle batch effects across sequencing runs or extraction kits?

Sequencing run, extraction kit, and primer choice are recorded in metadata and QC reports. DADA2 runs per sequencing run; PERMANOVA and MaAsLin2 can include batch as a covariate when scoped. Abdill et al. (2025) showed primer and extraction kit associate with compositional variation—Pepkio documents these confounders rather than over-interpreting batch-driven shifts.

Do I own the code — and in what format is it delivered?

Yes — you retain full ownership of code, scripts, and results. Pepkio delivers commented QIIME2 export scripts plus R or Python analysis code with conda lockfiles. Jupyter or R Markdown notebooks are available on request.

Can I be involved during analysis?

Yes. Checkpoint reviews occur after raw QC, after DADA2 denoising (before differential testing), and before final delivery. You can review metadata, covariates, filtering thresholds, and contrasts within agreed scope. A PhD-level bioinformatician serves as your primary contact at each milestone.

What does post-delivery reviewer support include?

Clarification of methods, DADA2 parameters, filtering logic, and minor figure or table revisions within agreed scope (typically ≤20% of deliverables). Substantial new analyses—additional contrasts, re-running with alternate databases, or PICRUSt2 extensions—are scoped as separate milestones.

Is co-authorship required?

No. Pepkio operates as a fee-for-service provider and does not require co-authorship unless explicitly discussed in advance.

Should we use V4 or V3–V4 primers for our study?

Primer choice affects taxonomic coverage and resolution. Johnson et al. (2019) showed short-read hypervariable-region amplicons cannot match full-length 16S species-level resolution; V4 (515F/806R) is widely used in Earth Microbiome Project protocols (Thompson et al., 2017). V3–V4 amplicons (~460 bp) offer broader coverage but require sufficient paired-end overlap—confirm read length against amplicon size at kickoff.

Why does Pepkio default to ASVs instead of 97% OTU clustering?

ASVs represent exact sequences and avoid arbitrary clustering thresholds that merge distinct variants (Callahan et al., 2016). Denoising removes spurious sequences more effectively than distance-based OTU picking in mock-community benchmarks. Legacy OTU output for method comparison is scoped on request.

Can you run PICRUSt2 or other custom non-standard 16S analyses?

Yes, when scoped at kickoff. PICRUSt2 functional inference, co-occurrence or network analysis, pre-built ASV table re-analysis with alternate databases, client-specified differential-abundance models, and custom reference classifiers are agreed before analysis begins.

Related services

  • Shotgun metagenomicsSpecies- and strain-level taxonomic profiling plus direct functional pathway quantification when 16S resolution is insufficient.
  • MetatranscriptomicsActive gene expression profiling to complement taxonomic snapshots from 16S amplicons.
  • Metagenomics analysis servicesHub page comparing 16S, shotgun, and metatranscriptomic entry points with shared QC standards.
  • Custom consultingAmplicon-vs-shotgun feasibility review and primer selection before committing to sequencing.
References
  1. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: high-resolution sample inference from Illumina amplicon data. Nature Methods. 2016;13(7):581–583. https://doi.org/10.1038/nmeth.3869 (PMID: 27214047)
  2. Bokulich NA, Kaehler BD, Rideout JR, Dillon M, Bolyen E, Knight R, Huttley GA, Caporaso JG. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin. Microbiome. 2018;6:90. https://doi.org/10.1186/s40168-018-0470-z (PMID: 29773078)
  3. Bolyen E, Rideout JR, Dillon MR, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology. 2019;37(8):852–857. https://doi.org/10.1038/s41587-019-0209-9 (PMID: 31341288)
  4. Johnson JS, Spakowicz DJ, Hong BY, et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nature Communications. 2019;10:5029. https://doi.org/10.1038/s41467-019-13036-1 (PMID: 31695033)
  5. McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE. 2013;8(4):e61217. https://doi.org/10.1371/journal.pone.0061217 (PMID: 23630581)
  6. Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nature Communications. 2020;11:3514. https://doi.org/10.1038/s41467-020-17041-7 (PMID: 32665548)
  7. Mallick H, Rahnavard A, McIver LJ, et al. Multivariable association discovery in population-scale meta-omics studies. PLoS Computational Biology. 2021;17(11):e1009442. https://doi.org/10.1371/journal.pcbi.1009442 (PMID: 34784344)
  8. Mas-Lloret J, Obón-Santacana M, Ibáñez-Sanz G, et al. Gut microbiome diversity detected by high-coverage 16S and shotgun sequencing of paired stool and colon sample. Scientific Data. 2020;7:92. https://doi.org/10.1038/s41597-020-0427-5 (PMID: 32179734)
  9. Lloyd-Price J, Arze C, Ananthakrishnan AN, et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature. 2019;569(7758):655–662. https://doi.org/10.1038/s41586-019-1237-9 (PMID: 31142855)
  10. Thompson LR, Sanders JG, McDonald D, et al. A communal catalogue reveals Earth's multiscale microbial diversity. Nature. 2017;551(7681):457–463. https://doi.org/10.1038/nature24621 (PMID: 29088705)
  11. Anslan S, Mikryukov V, Armolaitis K, et al. Highly comparable metabarcoding results from MGI-Tech and Illumina sequencing platforms. PeerJ. 2021;9:e12254. https://doi.org/10.7717/peerj.12254 (PMID: 34703674)
  12. Nikodemova M, Holzhausen EA, Deblois CL, et al. The effect of low-abundance OTU filtering methods on the reliability and variability of microbial composition assessed by 16S rRNA amplicon sequencing. Frontiers in Cellular and Infection Microbiology. 2023;13:1165295. https://doi.org/10.3389/fcimb.2023.1165295 (PMID: 37377642)
  13. Abdill RJ, Graham SP, Rubinetti V, Ahmadian M, Hicks P, Chetty A, McDonald D, Ferretti P, Gibbons E, Rossi M, Krishnan A, Albert FW, Greene CS, Davis S, Blekhman R. Integration of 168,000 samples reveals global patterns of the human gut microbiome. Cell. 2025;188(4):1100–1118.e17. https://doi.org/10.1016/j.cell.2024.12.017 (PMID: 39848248)
  14. Kumar A, Sun R, Habib B, et al. Identification of medication–microbiome interactions that affect gut infection. Nature. 2025;644:506–515. https://doi.org/10.1038/s41586-025-09273-8 (PMID: 40670788)
  15. Hillmann B, Al-Ghalith GA, Shields-Cutler RR, et al. Evaluating the information content of shallow shotgun metagenomics. mSystems. 2018;3(6):e00069-18. https://doi.org/10.1128/mSystems.00069-18 (PMID: 30443602)
  16. Anderson MJ. A new method for non-parametric multivariate analysis of variance. Austral Ecology. 2001;26(1):32–46. https://doi.org/10.1111/j.1442-9993.2001.01070.x
  17. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–3048. https://doi.org/10.1093/bioinformatics/btw354 (PMID: 27312411)
  18. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. https://doi.org/10.1093/bioinformatics/bty624 (PMID: 30423086)

Let's Talk About Your Science

Tell us:

  • • Your biological question
  • • Data type and size
  • • Timeline constraints

We'll tell you:

  • • What's feasible
  • • How long it will take
  • • Exactly what it will cost
Contact Us

Contact us to start with a free consultation. Need everyday bench calculators? Try our free lab tools.