Bioinformatics analysis service

Metagenomics Analysis Services — Reproducible Taxonomic and Functional Profiling from 16S, Shotgun, and Metatranscriptomic Data

Metagenomics analysis profiles taxonomic composition and functional potential of microbial communities from 16S amplicon, shotgun, or metatranscriptomic data. Pepkio delivers version-pinned QC, profiling, diversity, and differential-abundance workflows with full code, figures, and a Methods draft for academic, biotech, and pharma teams. Custom inputs, outputs, and non-standard analyses are scoped at kickoff.

Key facts

Key facts about metagenomics analysis
FactValue
Data types supportedIllumina paired- and single-end FASTQ (16S/ITS amplicon, shotgun metagenomic WGS, metatranscriptomic RNA-seq); pre-built ASV tables, Kraken2/MetaPhlAn profiles, or HUMAnN outputs accepted on request
Reference builds or standards usedSILVA 138 (16S/18S); Greengenes2 2022.10; UNITE 9.0 (ITS); GTDB release 214; NCBI RefSeq; ChocoPhlAn SGB (Jun 2023) for HUMAnN 3.9; host removal against GRCh38 or GRCm39 when scoped
Primary tools (with versions)QIIME2 2024.10; MetaPhlAn 4.1.0; HUMAnN 3.9; Kraken2 2.1.3; Bracken 2.9; MEGAHIT 1.2.9; MetaBAT2 2.17; ANCOM-BC 2.4.0; MaAsLin2 1.18.0; fastp 0.24.0; MultiQC 1.25.1 (full list in workflow and Tools sections)
Typical turnaround range3–5 weeks (standard 16S or shallow shotgun cohort, ≤24 samples, one contrast); 5–8 weeks (deep shotgun with MAG recovery, metatranscriptomics, or multi-contrast designs) — confirmed at kickoff
Deliverable formatsASV/OTU and abundance tables (`.csv`, `.biom`, `.qza` on request); MetaPhlAn/Kraken2 profiles; HUMAnN `pathabundance` and `genefamilies`; diversity metrics; differential-abundance tables; PDF/SVG figures; HTML MultiQC report; commented R/Python scripts; Methods draft
Regulatory/reproducibility standards followedVersion-pinned conda environments; sample manifests; containerized workflows aligned with CAMI II reproducibility guidance (Meyer et al., 2022); private Git archival on request
Custom / bespoke analysisNon-standard inputs, outputs, and methods scoped at kickoff—e.g., StrainPhlAn tracking, PICRUSt2 functional inference from 16S, co-assembly/binning extensions, custom reference databases, or client-specified statistical models

Key terms: Metagenomics studies mixed microbial communities without culturing. An amplicon sequence variant (ASV) is an exact 16S/ITS sequence (Callahan et al., 2016). A metagenome-assembled genome (MAG) is a draft genome from shotgun reads. Metatranscriptomics profiles actively expressed community genes.

What is metagenomics?

Metagenomics sequences DNA or RNA from mixed microbial communities to identify which organisms are present, what genes they carry, and—with metatranscriptomics—which genes are actively expressed, without isolating individual strains. It answers: who is there, in what proportions, and what are they doing? Unlike culture-based microbiology, metagenomics captures uncultured taxa from stool, soil, wastewater, or clinical swabs. In a high-coverage human gut study, shotgun metagenomes ranged from 47 to 92 million reads per sample alongside >300,000 16S reads per sample (Mas-Lloret et al., 2020). Pepkio supports 16S, shotgun WGS, and metatranscriptomic entry points with modality-specific pipelines.

What metagenomics analysis can answer

Metagenomics links microbial community structure to disease, treatment response, and environmental change. Representative questions with published examples:

  • Which gut taxa and functions differ between IBD patients and healthy controls? The IBD Multi'omics Database profiled 132 longitudinal subjects and linked microbiome shifts to disease activity and treatment response (Lloyd-Price et al., 2019).
  • Does donor strain engraftment predict FMT success in recurrent C. difficile infection? Aggarwala et al. (2021) used shotgun metagenomics to quantify donor strain engraftment and found it explained clinical relapse versus remission with 100% precision and 95% recall across 13 FMT interventions.
  • Which metabolic pathways shift after antibiotic exposure? Franzosa et al. (2018) showed that paired metagenomic and metatranscriptomic profiling resolves pathway-level responses that taxonomic profiles alone miss in the human gut microbiome.
  • How does soil microbial functional gene composition change across elevation? Yang et al. (2014) profiled grassland soil metagenomes along a Tibetan elevation gradient and found stress and nutrient-cycling genes shifted with elevation and soil conditions.
  • Which bacterial species, strains, and phage shift during early infant gut colonization? Sharon et al. (2014) applied time-series community genomics to infant stool and reported rapid turnover in bacterial species, strains, and phage during early-life colonization.

Services included in this category

Metagenomics services offered by Pepkio
ServiceDescriptionPrimary tools
Shotgun metagenomicsSpecies- and strain-level taxonomic profiling, functional pathway quantification, and optional MAG recovery from whole-community DNAMetaPhlAn 4.1.0, HUMAnN 3.9, Kraken2 2.1.3, MEGAHIT 1.2.9, MetaBAT2 2.17
16S rRNA amplicon sequencingASV-resolved community profiling from hypervariable-region amplicons for cost-effective cohort studiesQIIME2 2024.10, DADA2 (q2-dada2), SILVA 138 / Greengenes2 2022.10
MetatranscriptomicsActive gene expression profiling from community RNA after rRNA depletionSortMeRNA 4.3.7, Salmon 1.10.3, HUMAnN 3.9

What Pepkio delivers

Typical deliverables: ASV or taxonomic abundance tables (`.csv`, `.biom`, `.qza` on request); HUMAnN `pathabundance` and `genefamilies`; alpha/beta diversity metrics and PERMANOVA results; differential-abundance tables from ANCOM-BC or MaAsLin2; PDF/SVG figures (MultiQC, rarefaction, barplots, heatmaps); commented R/Python scripts with conda lockfile; `sample_manifest.csv`; HTML QC report; README; Methods draft with tool versions and database builds. Post-delivery support covers methods clarification and minor revisions within agreed scope (typically ≤20% of deliverables).

How the analysis works — step by step

  1. 1. Validate inputs and sample metadata

    Confirm FASTQ integrity, read layout, platform, and design; record IDs, batch, condition, and covariates in `sample_manifest.csv`.

    Tools and outputs

    Tools used: Custom validation scripts; MD5 checksum verification

    Output: sample_manifest.csv

  2. 2. QC and trim raw reads

    Assess adapters, low-quality tails, and overrepresented sequences; aggregate metrics for review (Ewels et al., 2016).

    Tools and outputs

    Tools used: FastQC 0.12.1; fastp 0.24.0; MultiQC 1.25.1

    Output: Per-sample reports; multiqc_report.html

  3. 3. Modality-specific preprocessing

    16S reads are denoised to ASVs (Callahan et al., 2016). Shotgun reads undergo optional host removal (GRCh38/GRCm39). Metatranscriptomic reads are filtered for rRNA (Franzosa et al., 2018).

    Tools and outputs

    Tools used: QIIME2 2024.10 q2-dada2; Bowtie2 2.5.4; SortMeRNA 4.3.7

    Output: ASV table or host-depleted/rRNA-filtered FASTQ

  4. 4. Taxonomic profiling

    MetaPhlAn 4.1.0 maps marker genes (Blanco-Míguez et al., 2023). Kraken2 + Bracken classify reads (Wood et al., 2019; Lu et al., 2017). QIIME2 assigns ASV taxonomy against SILVA or Greengenes2.

    Tools and outputs

    Tools used: MetaPhlAn 4.1.0; Kraken2 2.1.3 + Bracken 2.9; QIIME2 classify-sklearn

    Output: Relative-abundance profiles; taxonomy-annotated ASV table

  5. 5. Functional profiling

    HUMAnN 3.9 quantifies pathways and gene families from shotgun or metatranscriptomic reads (Franzosa et al., 2018). PICRUSt2 from 16S ASVs is scoped on request.

    Tools and outputs

    Tools used: HUMAnN 3.9; PICRUSt2 2.5.2 (on request)

    Output: pathabundance.tsv; genefamilies.tsv

  6. 6. Diversity and ordination

    Alpha and beta diversity on rarefied or transformed tables; PCoA/NMDS and PERMANOVA (McMurdie & Holmes, 2013; Anderson, 2001).

    Tools and outputs

    Tools used: QIIME2 q2-diversity; phyloseq 1.48.0; vegan 2.6-8.1

    Output: Diversity plots; PERMANOVA tables

  7. 7. Differential abundance testing

    ANCOM-BC, MaAsLin2, or DESeq2 when appropriate (Lin & Peddada, 2020; Mallick et al., 2021).

    Tools and outputs

    Tools used: ANCOM-BC 2.4.0; MaAsLin2 1.18.0; DESeq2 1.52.0 (when scoped)

    Output: da_results_<contrast>.csv

  8. 8. Package figures, scripts, and Methods draft

    Assemble figures, scripts, and a Methods draft with pinned versions. MAG co-assembly (MEGAHIT + MetaBAT2 + CheckM2) is scoped for shotgun projects (Meyer et al., 2022).

    Tools and outputs

    Tools used: R/Python plotting scripts; documented workflow archive

    Output: PDF/SVG figures; code; README; Methods draft

Tools and standards we use

Metagenomics tools and standards
ToolVersionRolePrimary citation
QIIME22024.10Amplicon workflow orchestration, diversity, taxonomyBolyen et al. (2019), doi:10.1038/s41587-019-0209-9
DADA2 (q2-dada2)2024.10.0ASV inference and chimera removalCallahan et al. (2016), doi:10.1038/nmeth.3869
MetaPhlAn4.1.0Shotgun taxonomic profilingBlanco-Míguez et al. (2023), doi:10.1038/s41587-023-01688-w
HUMAnN3.9Functional pathway quantificationFranzosa et al. (2018), doi:10.1038/s41592-018-0176-y
Kraken22.1.3k-mer taxonomic classificationWood et al. (2019), doi:10.1186/s13059-019-1891-0
Bracken2.9Abundance re-estimation from Kraken2Lu et al. (2017), doi:10.7717/peerj-cs.104
MEGAHIT1.2.9Metagenome assemblyLi et al. (2015), doi:10.1093/bioinformatics/btv033
MetaBAT22.17Genome binningKang et al. (2019), doi:10.7717/peerj.7359
CheckM21.0.2MAG quality assessmentChklovski et al. (2023), doi:10.1038/s41592-023-01940-w
SortMeRNA4.3.7rRNA filtering (metatranscriptomics)Kopylova et al. (2012), doi:10.1093/bioinformatics/bts611
ANCOM-BC2.4.0Differential abundance (compositional)Lin & Peddada (2020), doi:10.1038/s41467-020-17041-7
MaAsLin21.18.0Multivariable association testingMallick et al. (2021), doi:10.1371/journal.pcbi.1009442
fastp0.24.0Adapter trimming and QCChen et al. (2018), doi:10.1093/bioinformatics/bty624
MultiQC1.25.1Aggregated QC reportingEwels et al. (2016), doi:10.1093/bioinformatics/btw354

Common challenges — and how we handle them

Pipeline and database choice shifts taxonomic calls.
DNA extraction, library prep, sequencing platform, and bioinformatics each influence metagenomic profiles (Gulyás et al., 2024). Pepkio locks tool versions and database builds at kickoff and records them in the Methods draft.
Related strains confound profiling and binning.
CAMI II found closely related strains reduce assembly and binning accuracy (Meyer et al., 2022). Pepkio documents strain-resolution limits and scopes StrainPhlAn or MAG co-assembly when needed.
Host DNA or rRNA dominates reads.
Host contamination and rRNA overload reduce effective microbial depth (Franzosa et al., 2018). Pepkio reports removal rates at preprocessing and flags insufficient depth before testing.
Insufficient depth for species-level resolution.
Shallow shotgun can outperform 16S on taxonomic resolution and reproducibility in dense longitudinal designs (La Reau et al., 2023), but MWAS benefits from ≥15 million reads (Chen et al., 2021). Pepkio audits depth against modality benchmarks before differential testing.
Poor reproducibility across studies.
CAMI II emphasized reproducible, containerized workflows (Meyer et al., 2022). Pepkio delivers conda lockfiles, sample manifests, and optional private Git archives.

Common questions

What data do I need to provide for a metagenomics analysis project?

Demultiplexed FASTQ files (or pre-processed ASV/abundance tables), sample metadata with conditions and covariates, and a brief study description. For 16S, include primers and hypervariable regions; for shotgun or metatranscriptomics, note sample matrix and host species for depletion.

How long does metagenomics analysis take at Pepkio?

Standard projects (roughly 4–24 samples, one contrast, profiling through differential abundance) typically complete in 3–5 weeks. MAG recovery, metatranscriptomics, multi-contrast designs, or >24 samples may take 5–8 weeks. Milestone check-ins occur during QC, after profiling, and before delivery; exact timelines are confirmed at kickoff.

What do the deliverables look like?

Abundance tables, differential-abundance results, diversity metrics, PDF/SVG figures, an HTML MultiQC report, commented R/Python scripts with lockfiles, a Methods draft, and a README with reproduction steps.

Can you handle my sequencing platform or instrument?

Yes for Illumina paired- and single-end data from NovaSeq, NextSeq, and MiSeq. Ion Torrent 16S (denoise-pyro) and PacBio CCS amplicons (denoise-css) are supported when scoped at kickoff.

What if my data quality or sequencing depth is low?

Low-quality libraries are analyzed with caveats in the QC report. Samples below depth thresholds—e.g., <15 million shotgun reads for species-level MWAS (Chen et al., 2021)—may lack power for rare taxa. Outliers are flagged before testing.

Do you provide the code, and can I reproduce the results?

Yes—you retain full ownership. Pepkio delivers commented R or Python scripts with conda lockfiles or sessionInfo() exports to rerun from raw FASTQs on Linux or HPC.

Can I be involved during the analysis?

Yes. Checkpoint reviews occur after QC, after profiling, and before delivery. You can review metadata, covariates, filtering, and contrasts within agreed scope. A dedicated scientific contact leads the project.

What happens if a reviewer requests changes after delivery?

Clarification of methods and minor figure or table revisions within agreed scope (typically ≤20% of deliverables) are covered. Substantial new analyses are scoped as separate milestones, consistent with Pepkio reviewer-support policy.

Should I choose 16S amplicon or shotgun metagenomics?

16S is cost-effective for large cohorts but has poor species and functional resolution (La Reau et al., 2023). Shotgun resolves species-level taxa and pathway content at higher cost. In a densely sampled longitudinal stool cohort, shallow shotgun showed lower technical variation and higher taxonomic resolution than 16S at substantially lower cost than deep shotgun (La Reau et al., 2023). Modality is confirmed at kickoff.

Can you run custom or non-standard metagenomics analyses?

Yes—StrainPhlAn tracking, MAG binning, PICRUSt2, host-transcriptomics integration, custom databases, and client-specified models are scoped at kickoff. Pre-built profiles and non-Illumina data are accepted when feasibility is confirmed during intake.

Related services

References
  1. Meyer F, Fritz A, Deng Z-L, et al. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nature Methods. 2022;19(4):429–440. https://doi.org/10.1038/s41592-022-01431-4 (PMID: 35396482)
  2. Blanco-Míguez A, Beghini F, Cumbo F, et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nature Biotechnology. 2023;41(4):555–568. https://doi.org/10.1038/s41587-023-01688-w (PMID: 36823356)
  3. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: high-resolution sample inference from Illumina amplicon data. Nature Methods. 2016;13(7):581–583. https://doi.org/10.1038/nmeth.3869 (PMID: 27214047)
  4. Franzosa EA, McIver LJ, Rahnavard G, et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nature Methods. 2018;15(11):962–968. https://doi.org/10.1038/s41592-018-0176-y (PMID: 30377376)
  5. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biology. 2019;20(1):257. https://doi.org/10.1186/s13059-019-1891-0 (PMID: 31779668)
  6. La Reau AJ, Strom NB, Filvaroff E, et al. Shallow shotgun sequencing reduces technical variation in microbiome analysis. Scientific Reports. 2023;13:7668. https://doi.org/10.1038/s41598-023-33489-1 (PMID: 37169816)
  7. Chen C, Wang Y, Hu J, et al. Analysis and evaluation of different sequencing depths from 5 to 20 million reads in shotgun metagenomic sequencing, with optimal minimum depth being recommended. Genome. 2021;64(12):1111–1121. https://doi.org/10.1139/gen-2021-0120 (PMID: 35939836)
  8. Mas-Lloret J, Obón-Santacana M, Ibáñez-Sanz G, et al. Gut microbiome diversity detected by high-coverage 16S and shotgun sequencing of paired stool and colon sample. Scientific Data. 2020;7:92. https://doi.org/10.1038/s41597-020-0427-5 (PMID: 32179734)
  9. Lloyd-Price J, Arze C, Ananthakrishnan AN, et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature. 2019;569(7758):655–662. https://doi.org/10.1038/s41586-019-1237-9 (PMID: 31142855)
  10. Aggarwala V, Mogno I, Li Z, et al. Precise quantification of bacterial strains after fecal microbiota transplantation delineates long-term engraftment and explains outcomes. Nature Microbiology. 2021;6(10):1309–1318. https://doi.org/10.1038/s41564-021-00966-0 (PMID: 34580445)
  11. Sharon I, Moran U, Bernheim A, et al. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Research. 2014;24(1):111–120. https://doi.org/10.1101/gr.142315.112 (PMID: 22936250)
  12. Yang Y, Gao Y, Wang S, et al. The microbial gene diversity along an elevation gradient of the Tibetan grassland. The ISME Journal. 2014;8(2):430–440. https://doi.org/10.1038/ismej.2013.146 (PMID: 23985745)
  13. Gulyás G, Kakuk B, Dörmő Á, et al. Cross-comparison of gut metagenomic profiling strategies. Communications Biology. 2024;7:715. https://doi.org/10.1038/s42003-024-07158-6 (PMID: 39505993)
  14. Bolyen E, Rideout JR, Dillon MR, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology. 2019;37(8):852–857. https://doi.org/10.1038/s41587-019-0209-9 (PMID: 31341288)
  15. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674–1676. https://doi.org/10.1093/bioinformatics/btv033 (PMID: 25609793)
  16. Mallick H, Rahnavard G, McIver LJ, et al. Multivariable association discovery in population-scale meta-omics studies. PLOS Computational Biology. 2021;17(11):e1009442. https://doi.org/10.1371/journal.pcbi.1009442 (PMID: 34784344)
  17. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–3048. https://doi.org/10.1093/bioinformatics/btw354 (PMID: 27312411)
  18. McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE. 2013;8(4):e61217. https://doi.org/10.1371/journal.pone.0061217 (PMID: 23630581)
  19. Anderson MJ. A new method for non-parametric multivariate analysis of variance. Austral Ecology. 2001;26(1):32–46. https://doi.org/10.1046/j.1442-9993.2001.01070.x
  20. Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science. 2017;3:e104. https://doi.org/10.7717/peerj-cs.104
  21. Kang DD, Li F, Kirton E, et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from complex metagenomic habitats. PeerJ. 2019;7:e7359. https://doi.org/10.7717/peerj.7359 (PMID: 31388474)
  22. Chklovski A, Parks DH, Woodcroft BJ, Tyson GW. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nature Methods. 2023;20(8):1203–1212. https://doi.org/10.1038/s41592-023-01940-w (PMID: 37500759)
  23. Kopylova E, Noé L, Touzet H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics. 2012;28(24):3211–3217. https://doi.org/10.1093/bioinformatics/bts611 (PMID: 23071270)
  24. Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nature Communications. 2020;11:3514. https://doi.org/10.1038/s41467-020-17041-7 (PMID: 32665548)
  25. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. https://doi.org/10.1093/bioinformatics/bty624 (PMID: 30423086)

Individual services

Deep-dive pages for specific metagenomics methods and workflows.

Let's Talk About Your Science

Tell us:

  • • Your biological question
  • • Data type and size
  • • Timeline constraints

We'll tell you:

  • • What's feasible
  • • How long it will take
  • • Exactly what it will cost
Contact Us

Contact us to start with a free consultation. Need everyday bench calculators? Try our free lab tools.