Metagenomics

Shotgun Metagenomics Analysis Service — Species- and Pathway-Level Profiles from Raw FASTQs with Dual MetaPhlAn4/Kraken2 Profiling

Shotgun metagenomics profiles taxonomic composition and functional potential from untargeted DNA sequencing (Quince et al., 2017). Pepkio delivers version-pinned QC, dual taxonomic profilers, HUMAnN pathway quantification, and bespoke analyses scoped at kickoff for academic, biotech, and pharma studies. Reference-based strain-level taxonomy is reliable at 0.5–1.0 Gb per sample in mock-community benchmarks (Treichel et al., 2026). Scripts, figures, and a Methods draft included.

Key facts

Key facts about Shotgun Metagenomics
Fact	Value
Supported platforms / instruments	Illumina NovaSeq X / 6000 / NextSeq 2000, MiSeq; MGI DNBSEQ T7, G400, and G99 (paired-end FASTQ; read-header normalization when scoped at kickoff). Cross-platform profiles can be compared when library prep, profiling strategy, and depth are harmonized (Gulyás et al., 2024)
Input requirements	Paired-end FASTQ (≥2×100–150 bp typical); ≥0.5–1.0 Gb for reference-based species/strain profiling (Treichel et al., 2026); ≥2 Gb for pathway-level completeness (Treichel et al., 2026); ≥15 million reads for species-level MWAS (Liu et al., 2021); ≥10 Gb for MAG co-assembly when scoped (Treichel et al., 2026; Meyer et al., 2022). Sample count and contrasts confirmed at kickoff
Reference builds supported	GTDB release 214; Kraken2 PlusPF / PlusPF16G; ChocoPhlAn SGB (Jun 2023) for HUMAnN 3.9; host removal against GRCh38 or GRCm39; custom reference databases scoped at kickoff
Primary tools (with versions)	fastp 0.24.0; FastQC 0.12.1; KneadData 0.12.0; Bowtie2 2.5.4; MetaPhlAn 4.1.0; Kraken2 2.1.3; Bracken 2.9; HUMAnN 3.9; MEGAHIT 1.2.9; MetaBAT2 2.17; CheckM2 1.0.2 (MAG, on request); ANCOM-BC 2.4.0; MaAsLin2 1.18.0; phyloseq 1.48.0; vegan 2.6-8.1; MultiQC 1.25.1
Typical turnaround time	3–5 weeks (standard cohort ≤24 samples, profiling + one contrast); 5–8 weeks (deep shotgun, MAG recovery, multi-contrast designs) — confirmed at kickoff
Deliverable formats	MetaPhlAn/Kraken abundance tables (.tsv, .csv); HUMAnN pathabundance and genefamilies; diversity and differential-abundance tables; PDF/SVG figures; HTML MultiQC report; commented R/Python scripts; Methods draft
Key cited best-practice reference	Quince et al. (2017), Nature Biotechnology; Meyer et al. (2022), Nature Methods (CAMI II); Treichel et al. (2026), Nature Microbiology
Custom / bespoke analysis	StrainPhlAn tracking, custom Kraken2/MetaPhlAn databases, MAG binning, client-specified statistical models, non-standard inputs or outputs — scoped at kickoff

What is shotgun metagenomics?

Shotgun metagenomics sequences all DNA in a mixed microbial community without PCR targeting a single marker gene, then assigns reads to taxa and functional categories by reference mapping or assembly (Quince et al., 2017). Unlike 16S amplicon sequencing, it can profile bacteria, archaea, fungi, and viruses at higher resolution when databases and depth support detection, and quantifies pathways from genomic reads—not marker-gene proxies. In paired stool samples, shotgun metagenomes yielded 47–92 million reads alongside >300,000 16S reads (Mas-Lloret et al., 2020). Pepkio processes Illumina and MGI FASTQs through host depletion, dual taxonomic profiling, and HUMAnN; MAG recovery and bespoke extensions are agreed at kickoff. See the shotgun metagenomics glossary.

When should you use shotgun metagenomics?

Shotgun metagenomics fits when you need species- or strain-level taxonomy, direct functional pathway quantification, or detection of non-bacterial community members. The table contrasts shotgun with 16S amplicon and metatranscriptomic alternatives.

Comparison of shotgun metagenomics, 16S amplicon, and metatranscriptomics
Approach	Best for	Limitations	Approximate cost range
Shotgun metagenomics	Species/strain taxonomy, MetaCyc pathway quantification, viruses/fungi/archaea	Higher per-sample cost than 16S; host DNA reduces effective microbial depth; MAG and proteome goals need deep sequencing (>10 Gb) (Treichel et al., 2026)	Library prep + sequencing + bioinformatics vary by depth, sample count, and MAG scope
16S rRNA amplicon	Large cohorts, cost-effective community-shift detection	Poor species and functional resolution relative to shotgun (La Reau et al., 2023)	Lowest sequencing cost per sample
Metatranscriptomics	Active gene expression and treatment-response dynamics	rRNA depletion, RNA stability, and higher lab complexity (Franzosa et al., 2018)	Higher prep and compute than DNA shotgun

IBD and disease activity: Lloyd-Price et al. (2019) linked microbiome shifts to disease activity in 132 longitudinal IBD subjects.
FMT engraftment: Aggarwala et al. (2021) quantified donor strain engraftment in 13 FMT interventions (100% precision, 95% recall for relapse vs remission in that cohort).
Infant colonization: Sharon et al. (2014) reported rapid turnover in species, strains, and phage during early-life gut colonization.

How the analysis works — step by step

1. Validate inputs and sample metadata
Pepkio verifies FASTQ integrity (MD5), read length, paired-end structure, and platform. Sample matrix, host species, batch, and contrasts are recorded in sample_manifest.csv; sub-threshold yield is flagged (Quince et al., 2017). MGI read-header normalization when scoped at kickoff.
Tools and outputs
Tools used: md5sum; custom validation scripts
Output: sample_manifest.csv with sample IDs, platform, read counts, host species, and QC flags
2. QC and trim raw reads
FastQC assesses per-base quality, adapter content, and duplication; fastp trims adapters and low-quality ends when needed (Chen et al., 2018). Low Q30 yield or extreme adapter contamination is flagged before host removal. MultiQC aggregates per-sample metrics (Ewels et al., 2016).
Tools and outputs
Tools used: FastQC 0.12.1; fastp 0.24.0; MultiQC 1.25.1
Output: fastqc/ reports; fastp.json / fastp.html; multiqc_report.html
3. Remove host and contaminant reads
KneadData trims, filters, and removes host DNA with Bowtie2 against GRCh38 or GRCm39 (Beghini et al., 2021). Host-depletion rate and post-filter read counts are reported because host DNA confounds shallow metagenomics (Treichel et al., 2026; Franzosa et al., 2018). Insufficient microbial reads after depletion are flagged before profiling.
Tools and outputs
Tools used: KneadData 0.12.0; Bowtie2 2.5.4
Output: Host-depleted FASTQ; host_depletion_summary.csv with pre/post read counts and pct host removed
4. Profile taxonomy with MetaPhlAn 4
MetaPhlAn 4 maps reads to clade-specific marker genes in the SGB catalog and estimates relative abundance at species and strain level where markers support it (Blanco-Míguez et al., 2023). Unclassified read fraction is compared against depth expectations before downstream testing.
Tools and outputs
Tools used: MetaPhlAn 4.1.0
Output: Per-sample MetaPhlAn profiles; merged metaphlan4_species.tsv
5. Classify reads with Kraken2 and re-estimate abundance with Bracken
Kraken2 assigns reads by k-mer matching against PlusPF (Wood et al., 2019); Bracken re-estimates species-level abundance (Lu et al., 2017). Results are cross-checked against MetaPhlAn to flag phantom-taxa patterns at high depth (Johnson et al., 2022; McGill et al., 2024).
Tools and outputs
Tools used: Kraken2 2.1.3; Bracken 2.9
Output: Kraken2 reports; merged kraken2_bracken_species.tsv
6. Quantify functional potential with HUMAnN 3
HUMAnN 3 maps reads to ChocoPhlAn pangenomes and UniRef gene families, then aggregates MetaCyc pathway abundances (Franzosa et al., 2018). Samples below agreed pathway depth are flagged before differential testing (Treichel et al., 2026).
Tools and outputs
Tools used: HUMAnN 3.9
Output: pathabundance.tsv; genefamilies.tsv; per-sample HUMAnN logs with mapping statistics
7. Co-assemble and bin MAGs when scoped
MEGAHIT co-assembles host-depleted reads; MetaBAT2 bins contigs; CheckM2 assesses completeness and contamination (Li et al., 2015; Kang et al., 2019; Chklovski et al., 2023). MAG chimerism limits are documented because even high-quality MAGs may not represent a single strain (Treichel et al., 2026; Meyer et al., 2022). Optional; scoped at kickoff.
Tools and outputs
Tools used: MEGAHIT 1.2.9; MetaBAT2 2.17; CheckM2 1.0.2
Output: {sample_or_cohort}.contigs.fa; {bin}.fa MAGs; mag_qc_summary.csv with completeness, contamination, and CheckM2 lineage
8. Compute alpha and beta diversity
Alpha diversity (Shannon, Simpson, observed richness) and beta diversity (Bray-Curtis, Aitchison distance after CLR transform) are computed on rarefied or transformed abundance tables (McMurdie & Holmes, 2013). PCoA or NMDS ordination and PERMANOVA test community separation by metadata factors (Anderson, 2001).
Tools and outputs
Tools used: phyloseq 1.48.0; vegan 2.6-8.1
Output: alpha_diversity.csv; beta_diversity_distance_matrix.csv; PERMANOVA results table
9. Test differential abundance
Taxa and pathways are tested with ANCOM-BC or MaAsLin2 for compositional data with covariate adjustment (Lin & Peddada, 2020; Mallick et al., 2021). Benjamini-Hochberg FDR unless a pre-specified alternative is agreed at kickoff.
Tools and outputs
Tools used: ANCOM-BC 2.4.0; MaAsLin2 1.18.0
Output: da_results_<contrast>.csv with feature-level coefficients, standard errors, p-values, and q-values
10. Package figures, scripts, and Methods draft
MultiQC aggregates QC metrics across samples. Figure-ready plots, commented scripts, README, and a Methods draft listing exact tool versions and database builds are packaged per agreed retention policy (Meyer et al., 2022).
Tools and outputs
Tools used: R/Python plotting scripts; MultiQC 1.25.1
Output: PDF/SVG figures; final deliverable bundle with scripts, README, Methods draft, and HTML QC report

What Pepkio delivers

Processed data files

Host-depleted FASTQ (when agreed); MetaPhlAn and Kraken/Bracken abundance tables
HUMAnN pathabundance.tsv and genefamilies.tsv; diversity and differential-abundance tables
QC summaries; optional MAG .fa files

Figures (PDF/SVG)

MultiQC summary; rarefaction curves (when applicable); stacked barplots
Heatmaps of top taxa and pathways; PCoA/NMDS ordination with metadata coloring
Differential-abundance volcano or coefficient plots

Code

Commented bash, R, and Python scripts per stage; conda lockfile or sessionInfo() export
Delivery via private Git or agreed file transfer

Documentation

HTML MultiQC report; README with reproduction instructions
Methods draft with exact software versions, database builds, and statistical tests
Bespoke milestones scoped at kickoff; post-delivery reviewer support within agreed scope (typically ≤20% of deliverables)

Technical decisions we make — and why

Dual taxonomic profilers: MetaPhlAn 4.1.0 and Kraken2 2.1.3 + Bracken 2.9: MetaPhlAn uses marker genes; Kraken2 uses k-mers and can produce phantom taxa at high depth (Wood et al., 2019; Blanco-Míguez et al., 2023; Johnson et al., 2022; McGill et al., 2024). Both profiles are delivered so clients can treat discordant low-abundance calls with caution.
Host removal: KneadData with GRCh38 or GRCm39: Host DNA confounds shallow metagenomics (Treichel et al., 2026; Franzosa et al., 2018). Skipping depletion is available for low-host matrices (e.g., soil) but not default for stool.
Functional profiling: HUMAnN 3.9 with ChocoPhlAn SGB (Jun 2023): Read-based pathway quantification for human-associated metagenomes (Franzosa et al., 2018). Assembly-based CDS detection has poor sensitivity below ~5× coverage (Ye et al., 2020). De novo assembly is scoped separately for novel-gene discovery.
Differential testing: ANCOM-BC or MaAsLin2: Compositional methods avoid inflated false positives from raw t-tests (Lin & Peddada, 2020; Mallick et al., 2021). Method choice confirmed at kickoff.
Depth thresholds: goal-specific: Reference-based taxonomy at 0.5–1.0 Gb; pathways at ≥2 Gb; MAG/proteome at >10 Gb (Treichel et al., 2026). Underpowered samples are flagged before testing.

Common questions

What is the minimum sequencing depth and sample count for shotgun metagenomics analysis?

For species-level taxonomic profiling, Pepkio recommends ≥0.5–1.0 Gb per sample when reference databases cover the community (Treichel et al., 2026). Pathway-level inference typically requires ≥2 Gb; species-level MWAS benefits from ≥15 million reads (Liu et al., 2021). Sample count depends on effect size, prevalence, and contrast design; minimum n and power are confirmed at kickoff. MAG recovery requires ≥10 Gb and is scoped separately.

Can you analyze low-quality or low-yield shotgun metagenomics libraries?

Yes, with caveats. Low Q30 yield, high adapter content, or insufficient post-host-depletion reads reduce taxonomic and functional sensitivity (Treichel et al., 2026). Sub-threshold samples are flagged in the QC report; re-sequencing is discussed before full differential testing. Shallow shotgun (2–5 million reads) showed high concordance with deep shotgun for alpha/beta diversity and species composition in dense longitudinal stool sampling (La Reau et al., 2023), but rare-taxa and strain-level claims require higher depth.

Do you support Illumina and MGI DNBSEQ shotgun metagenomics data?

Yes. Illumina NovaSeq X, 6000, NextSeq 2000, and MiSeq paired-end FASTQs use the standard workflow. MGI DNBSEQ T7, G400, and G99 paired-end FASTQs are supported; read-header normalization is applied when scoped at kickoff. Cross-platform profiles can be compared when library prep, profiling strategy, and depth are harmonized (Gulyás et al., 2024). Ion Torrent and long-read metagenomics are scoped separately when feasibility is confirmed during intake.

How long does shotgun metagenomics analysis take at Pepkio?

Standard projects (roughly 4–24 samples, host depletion, dual profiling, HUMAnN, one contrast, differential abundance) typically complete in 3–5 weeks. Deep shotgun with MAG recovery, multi-contrast designs, or >24 samples may require 5–8 weeks. Milestone check-ins occur after QC, after profiling, and before delivery; exact timelines are confirmed at kickoff.

How do you handle batch effects in multi-batch shotgun metagenomics cohorts?

Batch effects from extraction kit, sequencing center, and flowcell can exceed biological signal if unaddressed (Quince et al., 2017; Gulyás et al., 2024). Pepkio records batch covariates in sample_manifest.csv, stratifies QC by batch, and includes batch in MaAsLin2 or ANCOM-BC when specified at kickoff. Batch randomization across conditions is recommended at study design; correction beyond standard covariate adjustment is scoped when needed.

Do I own the code — and in what format is it delivered?

Yes — you retain full ownership of code, scripts, and results. Pepkio delivers commented bash, R, and Python scripts with conda lockfiles or sessionInfo() exports so you can rerun from raw FASTQs on Linux or HPC. Deliverables are organized by pipeline stage with README instructions; Jupyter or R Markdown notebooks are available on request.

Can I be involved during the analysis?

Yes. Checkpoint reviews occur after QC and host depletion, after taxonomic and functional profiling, and before final delivery. Within agreed scope, you can review metadata, covariates, filtering thresholds, and contrasts before final statistics. A PhD-level scientific contact leads the project, coordinates milestone feedback, and documents decisions in the project record.

What does post-delivery reviewer support include?

Post-delivery support covers clarification of methods, QC thresholds, database builds, and minor figure or table revisions within agreed scope (typically ≤20% of deliverables). Methods and Supplementary drafts are included for analyses Pepkio performed. Substantial new analyses, re-analysis with new contrasts, or major rewrites requested after delivery are scoped as separate milestones with updated pricing.

Is co-authorship required?

No. Pepkio does not require co-authorship unless explicitly discussed and agreed in writing before project start. Acknowledgment of bioinformatics support in the Methods or Acknowledgments section is standard practice and appreciated by our team.

Should I trust MetaPhlAn or Kraken2 results when they disagree?

Use concordant calls at species level as high-confidence; treat Kraken2-only low-abundance taxa with caution because k-mer classifiers can misassign reads from abundant species at high depth (Johnson et al., 2022). MetaPhlAn-only calls reflect marker-gene coverage and may miss taxa absent from the SGB catalog. Pepkio delivers both profiles and documents discordance in the QC report; consensus filtering or integrative approaches are scoped at kickoff when clients require a single feature table.

How much host DNA is too much for reliable shotgun metagenomics profiling?

Host DNA is a confounder in shallow metagenomics and reduces effective microbial depth (Treichel et al., 2026). Pepkio reports host-depletion rate per sample; samples with very high pre-depletion host fractions often retain insufficient microbial depth for pathway or rare-taxa analysis even after KneadData. Sample-specific thresholds are confirmed at kickoff based on matrix (stool vs. swab vs. environmental).

Can Pepkio run MAG recovery, StrainPhlAn, or other custom shotgun metagenomics analyses?

Yes. MAG co-assembly and binning, StrainPhlAn strain tracking, custom Kraken2 or MetaPhlAn databases, and client-specified statistical models are scoped at kickoff with milestone pricing. Resistome or other specialized profiling is available when reference databases and study design support the endpoint. Pepkio also accepts pre-built MetaPhlAn, Kraken2, or HUMAnN profiles when clients need downstream statistics only.

Related services

16S rRNA amplicon sequencing — Cost-effective ASV-resolved community profiling when species-level taxonomy and direct functional quantification are not required.
Metatranscriptomics — Active gene expression profiling to complement DNA-based functional potential estimates.
Statistical analysis — Replicate planning, contrast design, and power estimation before sequencing.
Metabolomics — Small-molecule measurements to complement HUMAnN pathway predictions.
Custom consulting — Sequencing depth, platform, and host-depletion planning before library prep.

References

Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nature Biotechnology. 2017;35(9):833–844. https://doi.org/10.1038/nbt.3935 (PMID: 28898207)
Meyer F, Fritz A, Deng Z-L, et al. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nature Methods. 2022;19(4):429–440. https://doi.org/10.1038/s41592-022-01431-4 (PMID: 35396482)
Treichel NS, Pauvert C, Séneca J, et al. Benchmarking of shotgun sequencing depth reveals the potential and limitations of shallow metagenomics and strain-level analysis. Nature Microbiology. 2026. https://doi.org/10.1038/s41564-026-02334-2 (PMID: 42014453)
Blanco-Míguez A, Beghini F, Cumbo F, et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nature Biotechnology. 2023;41(4):555–568. https://doi.org/10.1038/s41587-023-01688-w (PMID: 36823356)
Franzosa EA, McIver LJ, Rahnavard G, et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nature Methods. 2018;15(11):962–968. https://doi.org/10.1038/s41592-018-0176-y (PMID: 30377376)
Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biology. 2019;20(1):257. https://doi.org/10.1186/s13059-019-1891-0 (PMID: 31779668)
Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science. 2017;3:e104. https://doi.org/10.7717/peerj-cs.104
La Reau AJ, Strom NB, Filvaroff E, et al. Shallow shotgun sequencing reduces technical variation in microbiome analysis. Scientific Reports. 2023;13:7668. https://doi.org/10.1038/s41598-023-33489-1 (PMID: 37169816)
Liu J, Wang X, Xie H, Zhong Q, Xia Y. Analysis and evaluation of different sequencing depths from 5 to 20 million reads in shotgun metagenomic sequencing, with optimal minimum depth being recommended. Genome. 2021;64(12):1111–1121. https://doi.org/10.1139/gen-2021-0120 (PMID: 35939836)
Lloyd-Price J, Arze C, Ananthakrishnan AN, et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature. 2019;569(7758):655–662. https://doi.org/10.1038/s41586-019-1237-9 (PMID: 31142855)
Aggarwala V, Mogno I, Li Z, et al. Precise quantification of bacterial strains after fecal microbiota transplantation delineates long-term engraftment and explains outcomes. Nature Microbiology. 2021;6(10):1309–1318. https://doi.org/10.1038/s41564-021-00966-0 (PMID: 34580445)
Sharon I, Moran U, Bernheim A, et al. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Research. 2014;24(1):111–120. https://doi.org/10.1101/gr.142315.112 (PMID: 22936250)
Gulyás G, Kakuk B, Dörmő Á, et al. Cross-comparison of gut metagenomic profiling strategies. Communications Biology. 2024;7:715. https://doi.org/10.1038/s42003-024-07158-6 (PMID: 39505993)
Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nature Communications. 2020;11:3514. https://doi.org/10.1038/s41467-020-17041-7 (PMID: 32665548)
Mallick H, Rahnavard G, McIver LJ, et al. Multivariable association discovery in population-scale meta-omics studies. PLOS Computational Biology. 2021;17(11):e1009442. https://doi.org/10.1371/journal.pcbi.1009442 (PMID: 34784344)
Johnson JS, Sun S, Fodor AA. Systematic classification error profoundly impacts inference in high-depth whole genome shotgun sequencing datasets. bioRxiv. 2022. https://doi.org/10.1101/2022.04.04.487034
McGill SK, Walker RL, Fiehn O, et al. Integrative analysis across metagenomic taxonomic classifiers: a case study of the gut microbiome in aging and longevity in the Integrative Longevity Omics Study. PLOS Computational Biology. 2024;20(12):e1013883. https://doi.org/10.1371/journal.pcbi.1013883
Mas-Lloret J, Obón-Santacana M, Ibáñez-Sanz G, et al. Gut microbiome diversity detected by high-coverage 16S and shotgun sequencing of paired stool and colon sample. Scientific Data. 2020;7:92. https://doi.org/10.1038/s41597-020-0427-5 (PMID: 32179734)
Beghini F, McIver LJ, Blanco-Míguez A, et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife. 2021;10:e65088. https://doi.org/10.7554/eLife.65088 (PMID: 33944776)
Ye SH, Siddle KJ, Park DJ, Sabeti PC. Benchmarking metagenomics tools for taxonomic and functional profiling. BMC Bioinformatics. 2020;21:427. https://doi.org/10.1186/s12859-020-03802-0
Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674–1676. https://doi.org/10.1093/bioinformatics/btv033 (PMID: 25609793)
Kang DD, Li F, Kirton E, et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from complex metagenomic habitats. PeerJ. 2019;7:e7359. https://doi.org/10.7717/peerj.7359 (PMID: 31388474)
Chklovski A, Parks DH, Woodcroft BJ, Tyson GW. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nature Methods. 2023;20(8):1203–1212. https://doi.org/10.1038/s41592-023-01940-w (PMID: 37500759)
Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. https://doi.org/10.1093/bioinformatics/bty624 (PMID: 30423086)
Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–3048. https://doi.org/10.1093/bioinformatics/btw354 (PMID: 27312411)
McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE. 2013;8(4):e61217. https://doi.org/10.1371/journal.pone.0061217 (PMID: 23630581)
Anderson MJ. A new method for non-parametric multivariate analysis of variance. Austral Ecology. 2001;26(1):32–46. https://doi.org/10.1046/j.1442-9993.2001.01070.x

Let's Talk About Your Science

Tell us:

• Your biological question
• Data type and size
• Timeline constraints

We'll tell you:

• What's feasible
• How long it will take
• Exactly what it will cost

Shotgun Metagenomics Analysis Service — Species- and Pathway-Level Profiles from Raw FASTQs with Dual MetaPhlAn4/Kraken2 Profiling

Key facts

What is shotgun metagenomics?

When should you use shotgun metagenomics?

How the analysis works — step by step

1. Validate inputs and sample metadata

2. QC and trim raw reads

3. Remove host and contaminant reads

4. Profile taxonomy with MetaPhlAn 4

5. Classify reads with Kraken2 and re-estimate abundance with Bracken

6. Quantify functional potential with HUMAnN 3

7. Co-assemble and bin MAGs when scoped

8. Compute alpha and beta diversity

9. Test differential abundance

10. Package figures, scripts, and Methods draft