Metagenomics

Metatranscriptomics Analysis Service — Active Pathway Expression from rRNA-Filtered FASTQs to HUMAnN Profiles and Differential-Activity Tables

Metatranscriptomics profiles actively expressed microbial genes and pathways—answering which community members are metabolically active, not just present (Zhang et al., 2021). Pepkio delivers version-pinned QC, rRNA filtering, MetaPhlAn and HUMAnN quantification, differential-activity testing, code, and a Methods draft for academic, biotech, and pharma teams. Custom inputs, outputs, and non-standard analyses are scoped at kickoff. Human gut projects typically target ≥40–50 million raw paired-end reads per sample (Westreich et al., 2016).

Key facts

Key facts about Metatranscriptomics
FactValue
Supported platforms / instrumentsIllumina NovaSeq X / 6000 / NextSeq 2000, HiSeq 2500/4000; MGI DNBSEQ-T7 / G400 / G99 when scoped at kickoff; ribo-depleted total RNA (e.g., Ribo-Zero, QIAseq FastSelect, NEBNext rRNA Depletion); stranded libraries when metadata provided; pre-built HUMAnN or MetaPhlAn outputs accepted on request
Input requirements≥40–50 million raw paired-end reads per human gut sample for >90% accuracy on low-abundance transcript estimates after annotation (Westreich et al., 2016); ≥2×100 bp or 2×150 bp typical; ≥3 biological replicates per condition recommended for differential testing—fewer flagged at kickoff; RIN and RNA extraction metadata encouraged; paired shotgun metagenomes optional for gene-copy normalization
Reference builds supportedChocoPhlAn SGB (Jun 2023) for HUMAnN 3.9; MetaPhlAn 4.1.0 marker database; SILVA 138 (SortMeRNA rRNA filtering); UniRef90 for HUMAnN; host subtraction against GRCh38 or GRCm39 when scoped
Primary tools (with versions)SortMeRNA 4.3.7; HUMAnN 3.9; MetaPhlAn 4.1.0; Kraken2 2.1.3; Bracken 2.9; Bowtie2 2.5.4; Salmon 1.10.3; fastp 0.24.0; FastQC 0.12.1; MultiQC 1.25.1; MaAsLin2 1.18.0; ANCOM-BC 2.4.0; MEGAHIT 1.2.9 (assembly scoped on request)
Typical turnaround time5–8 weeks (standard cohort, ≤24 samples, one contrast, profiling through differential activity); multi-contrast or paired DNA+RNA designs may extend timeline — confirmed at kickoff
Deliverable formatsHUMAnN pathabundance.tsv and genefamilies.tsv; MetaPhlAn relative-abundance profiles; differential-activity tables (.csv); PDF/SVG figures; HTML MultiQC report; commented R/Python scripts; Methods draft
Key cited best-practice referenceZhang et al. (2021), Annual Review of Biomedical Data Science; Franzosa et al. (2018), Nature Methods (HUMAnN functional profiling)
Custom / bespoke analysisPaired DNA normalization, Salmon assembly quantification, co-expression networks, AMR/virulence panels, custom references, or client-specified models — scoped at kickoff

What is metatranscriptomics?

Metatranscriptomics aligns and quantifies RNA-seq reads from mixed microbial communities to measure which genes and pathways are actively transcribed at sampling time—not merely encoded in community DNA. Unlike shotgun metagenomics, which reports gene catalog presence and copy number, metatranscriptomics captures dynamic activity such as nutrient utilization and stress responses (Franzosa et al., 2018). Unlike 16S amplicon sequencing, it resolves functional expression without inferring activity from taxonomy. Automated platforms have profiled cohorts exceeding 10,000 human stool samples (Hatch et al., 2020). Pepkio processes ribo-depleted FASTQs through host and rRNA filtering, taxonomic activity profiling, and HUMAnN pathway quantification with documented parameters; custom entry points are agreed at kickoff. See the metatranscriptomics glossary.

When should you use metatranscriptomics?

Metatranscriptomics fits when the research question requires active microbial function—pathway upregulation after treatment, transcriptional response to host immune activation, or time-resolved community activity—rather than static community membership.

Comparison of metatranscriptomics, shotgun metagenomics, and 16S amplicon sequencing
ApproachBest forLimitationsApproximate cost range
MetatranscriptomicsActive pathway and gene-family expression; perturbation and treatment-response studies; pairing with host phenotypesRNA degradation; rRNA depletion kit bias; several-fold higher depth than metagenomics for rare transcripts (Ojala et al., 2023)Higher per-sample sequencing and bioinformatics than 16S or shallow shotgun
Shotgun metagenomicsSpecies/strain catalog, gene presence, MAG recovery, gene copy numberDoes not distinguish expressed from silent genesModerate–high sequencing and storage cost
16S rRNA ampliconLarge cohorts, cost-effective taxonomy, longitudinal membership trackingNo direct functional expression; species resolution limitedLowest per-sample cost
  • Drug metabolism by gut microbes: Javdan et al. (2020) mapped microbial drug-metabolizing activity in human communities and validated robust expression of a widespread 20β-HSDH gene in metatranscriptomic data from a patient-derived consortium.
  • Immune activation without compositional shift: Becattini et al. (2021) showed commensal transcription reprogrammed within 6 hours of host innate or adaptive immune activation—stress genes up, carbohydrate-utilization genes down—while 16S-based community composition remained stable.
  • Paired DNA and RNA in the human gut: Franzosa et al. (2018) demonstrated that metatranscriptomic pathway profiles resolve active metabolic responses that taxonomic or metagenomic DNA profiles alone do not capture.

How the analysis works — step by step

  1. 1. Validate inputs and sample metadata

    Pepkio confirms FASTQ integrity (MD5 checksums), read layout, platform, ribo-depletion method, and experimental design. Sample metadata are recorded in sample_manifest.csv. Designs with fewer than three biological replicates per condition are flagged before differential testing.

    Tools and outputs

    Tools used: Custom validation scripts; md5sum

    Output: sample_manifest.csv with library IDs, read counts, depletion kit, host species, and QC flags

  2. 2. QC and trim raw reads

    Adapter contamination, low-quality tails, and overrepresented sequences are assessed per library. When trimming is warranted, reads are filtered before downstream steps (Chen et al., 2018). Aggregated metrics are compiled for review (Ewels et al., 2016).

    Tools and outputs

    Tools used: FastQC 0.12.1; fastp 0.24.0; MultiQC 1.25.1

    Output: Per-sample FastQC/fastp reports; multiqc_report.html

  3. 3. Remove host reads

    In mucosal swabs and tissue, host RNA can dominate without subtraction (Ojala et al., 2023). Stool often shows low host fractions (Westreich et al., 2016); host mapping is applied when scoped. Excessive host fractions are flagged before profiling.

    Tools and outputs

    Tools used: Bowtie2 2.5.4; GRCh38 or GRCm39 reference index

    Output: Host-depleted FASTQ; host_removal_summary.csv with columns: sample_id, total_reads, host_reads, host_fraction

  4. 4. Filter ribosomal RNA

    Remaining rRNA is removed after experimental depletion. Kit depletion leaves species-skewed fractions; SortMeRNA filtering improves functional yield (Westreich et al., 2016; Ojala et al., 2023).

    Tools and outputs

    Tools used: SortMeRNA 4.3.7; SILVA 138 SSU/LSU databases

    Output: rRNA-depleted FASTQ; rrna_filter_summary.csv with columns: sample_id, reads_pre_filter, rrna_reads, rrna_fraction, reads_post_filter

  5. 5. Profile transcriptionally active taxa

    MetaPhlAn 4.1.0 estimates relative abundance of transcriptionally active lineages via marker genes (Blanco-Míguez et al., 2023). Kraken2 2.1.3 + Bracken 2.9 is run as an optional cross-check when scoped (Wood et al., 2019; Lu et al., 2017).

    Tools and outputs

    Tools used: MetaPhlAn 4.1.0; Kraken2 2.1.3 + Bracken 2.9 (on request)

    Output: metaphlan_profile.tsv; optional kraken2_bracken_abundance.csv

  6. 6. Quantify pathway and gene-family expression

    HUMAnN 3.9 profiles microbial pathways and gene families from filtered reads, stratified by MetaPhlAn community composition (Franzosa et al., 2018). Reads are mapped to ChocoPhlAn SGB (Jun 2023) and UniRef90; pathway abundances are reported in reads per kilobase (RPK).

    Tools and outputs

    Tools used: HUMAnN 3.9; ChocoPhlAn SGB (Jun 2023); UniRef90

    Output: pathabundance.tsv; genefamilies.tsv; per-sample HUMAnN logs

  7. 7. Normalize and audit expression tables

    Pathway and gene-family tables are transformed (CPM, log-CPM, or CLR as agreed at kickoff). Library sizes, detection rates, and sample correlations are audited; PCA or NMDS ordination is reviewed for batch effects. Samples below agreed depth thresholds are flagged before testing (Westreich et al., 2016).

    Tools and outputs

    Tools used: R vegan 2.6-8.1; custom Python/R audit scripts

    Output: expression_qc_summary.csv; PCA/NMDS ordination plots; sample correlation heatmap

  8. 8. Test differential pathway and taxon activity

    MaAsLin2 1.18.0 fits multivariable models with covariates (condition, batch, age, BMI, etc.) on transformed tables for pathway and taxonomic features as agreed at kickoff (Mallick et al., 2021). ANCOM-BC 2.4.0 is used for compositional taxonomic features when appropriate (Lin & Peddada, 2020). Benjamini–Hochberg q-values control FDR across tested features.

    Tools and outputs

    Tools used: MaAsLin2 1.18.0; ANCOM-BC 2.4.0

    Output: da_results_<contrast>.csv with columns: feature, coef, stderr, pval, qval, N; MaAsLin2 coefficient plots

  9. 9. Normalize by paired metagenome gene copy (when scoped)

    When matched shotgun metagenomes are available, HUMAnN expression estimates can be adjusted for gene copy number to distinguish transcriptional upregulation from DNA abundance changes (Franzosa et al., 2018). This step is scoped at kickoff when paired DNA FASTQs or pre-computed metagenomic profiles are provided.

    Tools and outputs

    Tools used: HUMAnN 3.9 --taxonomic-profile; MetaPhlAn 4.1.0 on paired DNA

    Output: pathabundance_copy_normalized.tsv; normalization log documenting paired sample mapping

  10. 10. Package figures, scripts, and Methods draft

    Pathway heatmaps, taxonomic barplots, and differential-activity plots are exported at publication resolution. Commented scripts reproduce agreed pipeline stages within project scope. A Methods draft cites software versions and database builds (Meyer et al., 2022). Salmon 1.10.3 transcript-level quantification against de novo or reference assemblies is available when clients require gene-level counts beyond HUMAnN gene families—scoped at kickoff (Patro et al., 2017).

    Tools and outputs

    Tools used: R/Python plotting scripts; documented workflow archive

    Output: PDF/SVG figures; R/Python scripts; README; Methods draft; final deliverable bundle

What Pepkio delivers

Processed data files

  • HUMAnN pathabundance.tsv and genefamilies.tsv; MetaPhlAn metaphlan_profile.tsv
  • da_results_<contrast>.csv; optional copy-normalized pathway tables; host/rRNA QC summaries

Figures (PDF/SVG)

  • MultiQC summary; read-quality, host/rRNA removal, and taxonomic activity plots
  • Pathway heatmaps; PCA/NMDS ordination; MaAsLin2 coefficient plots

Tables

  • sample_manifest.csv; da_results_<contrast>.csv
  • Optional Kraken2/Bracken abundances

Code

  • Commented R and Python scripts per stage; conda lockfiles or sessionInfo()
  • Delivery via private Git or agreed transfer

Documentation

  • QC report; README; Methods draft
  • Post-delivery support within agreed scope (typically ≤20% of deliverables)

Technical decisions we make — and why

rRNA removal: SortMeRNA 4.3.7 after experimental depletion
Kits leave species-skewed residual rRNA; remaining reads should be discarded before functional profiling (Westreich et al., 2016; Kopylova et al., 2012). Kit depletion alone is rejected because uneven depletion skews functional yield.
Functional quantification: HUMAnN 3.9 with MetaPhlAn-informed stratification
Widely used for cross-study pathway comparison with community-aware stratification (Franzosa et al., 2018). Assembly-based or Kraken-only alternatives are scoped when reference coverage is insufficient.
Differential testing: MaAsLin2 1.18.0 with explicit covariates
Multivariable models with batch and continuous covariates on transformed meta-omic tables (Mallick et al., 2021). ANCOM-BC 2.4.0 for compositional taxonomic features when appropriate (Lin & Peddada, 2020).
Host subtraction: Bowtie2 2.5.4 against GRCh38 or GRCm39
Host RNA can dominate mucosal and tissue samples (Ojala et al., 2023); stool often shows low host fractions (Westreich et al., 2016). Environmental samples skip this step; host species is confirmed at kickoff.
Paired metagenome normalization: scoped when DNA is available
Paired DNA+RNA enables copy-number correction for pathway activity (Franzosa et al., 2018). RNA-only projects document this limitation in the QC report.

Common questions

What is the minimum sequencing depth and replicate count for metatranscriptomics?

For human gut metatranscriptomes, Westreich et al. (2016) recommend ribo-depleted, 100 bp paired-end sequencing with 40–50 million raw reads per sample—yielding roughly 5–10 million annotated reads and >90% accuracy on low-abundance transcript estimates. At least three biological replicates per condition are recommended for differential testing; fewer are flagged at kickoff. Environmental or low-biomass matrices may require project-specific depth targets; Pepkio confirms thresholds at kickoff based on sample matrix and study goals.

Can you analyze low-quality or low-yield RNA libraries?

Yes, with caveats documented in the QC report. Samples with low RIN, insufficient reads after host and rRNA filtering, or high residual rRNA fractions may lack power for rare pathway detection (Ojala et al., 2023). Outlier samples in ordination are flagged before differential testing; re-sequencing is discussed when yield threatens the study question.

Do you support Illumina and MGI DNBSEQ metatranscriptomic data?

Yes. Pepkio processes ribo-depleted Illumina FASTQs from NovaSeq X, 6000, NextSeq 2000, and HiSeq instruments using the standard workflow. MGI DNBSEQ-T7, G400, and G99 FASTQs are processed when scoped at kickoff with adapter and QC validation in the report. Pre-built HUMAnN or MetaPhlAn outputs from either platform can be imported when upstream processing is complete.

How long does metatranscriptomics analysis take at Pepkio?

A standard project (roughly 4–24 samples, one primary contrast, profiling through differential activity) typically completes in 5–8 weeks from data receipt. Multi-contrast designs, paired DNA+RNA normalization, assembly-based quantification, or >24 samples may extend the timeline. Milestone check-ins occur during QC, after profiling, and before delivery; exact timelines are confirmed at kickoff.

How do you handle batch effects across sequencing runs or rRNA depletion kit lots?

When batch is known and not fully confounded with condition, Pepkio includes batch as a covariate in MaAsLin2 models (Mallick et al., 2021). PCA and correlation heatmaps are reviewed before modeling. rRNA depletion kit lot and extraction batch are recorded in sample_manifest.csv; post-hoc correction beyond the design formula is scoped separately when required.

Do I own the code — and in what format is it delivered?

Yes — you retain full ownership of all code, scripts, and results. Pepkio delivers commented R and Python scripts with conda lockfiles or sessionInfo() exports so you can rerun agreed stages on Linux or HPC. Tables use standard .csv and .tsv formats; deliverables are organized by pipeline stage with README instructions. R Markdown or Jupyter notebooks are available on request.

Can I be involved during analysis?

Yes. Checkpoint reviews occur after QC, after host/rRNA filtering, and before final delivery. Within agreed scope, you can review metadata, covariate choices, filtering thresholds, and contrast definitions before final statistics. A PhD-level scientific contact leads the project, coordinates milestone feedback, and records decisions in the shared project file throughout the engagement.

What does post-delivery reviewer support include?

Post-delivery support covers clarification of methods, QC thresholds, database builds, and minor figure or table revisions within agreed scope (typically ≤20% of deliverables). Methods drafts cover analyses Pepkio performed. Substantial new reviewer requests—additional contrasts, assembly-based requantification, or new covariate models—are scoped as separate milestones with updated pricing and timeline estimates.

Is co-authorship required?

No. Pepkio does not require co-authorship unless explicitly discussed and agreed in writing before project start. We operate as a fee-for-service CRO with no authorship conditions in standard statements of work. Acknowledgment of bioinformatics support in Methods or Acknowledgments is standard practice on our projects and appreciated by our team.

Do I need paired shotgun metagenomic DNA for metatranscriptomics?

No, but paired DNA improves interpretation. Metagenomic DNA reports gene presence and copy number; metatranscriptomics reports expression. Franzosa et al. (2018) showed that copy-number normalization with paired DNA distinguishes transcriptional upregulation from abundance changes. Pepkio analyzes RNA-only projects and documents this limitation; paired DNA normalization is scoped when metagenomic FASTQs or profiles are available.

How do you handle remaining rRNA after kit depletion?

Pepkio runs SortMeRNA 4.3.7 against SILVA 138 after experimental ribodepletion to remove residual rRNA reads (Kopylova et al., 2012; Westreich et al., 2016). rRNA fractions before and after filtering are reported per sample. Samples with high post-filter rRNA or low mRNA yield are flagged in the QC report before HUMAnN profiling.

Can you quantify virulence, AMR, or custom pathway expression?

Yes, when scoped at kickoff. HUMAnN gene-family tables can be filtered to client-specified gene sets annotated against CARD, VFDB, or custom reference databases. Assembly-based Salmon quantification and co-expression networks are supported as bespoke extensions beyond the standard HUMAnN workflow; annotation scope, reference versions, and feasibility are confirmed before analysis begins.

Related services

  • Shotgun metagenomicsPaired DNA for gene-copy normalization and species-level catalog alongside active expression.
  • 16S ampliconCost-effective taxonomic profiling when functional expression data are not required.
  • Bulk RNA-seqHost transcriptome profiling alongside microbiome activity in the same cohort.
  • MetabolomicsSmall-molecule validation of pathway activity inferred from metatranscriptomic data.
  • Experimental designReplicate planning and sequencing depth estimation before library prep.
References
  1. Zhang Y, Thompson KN, Branck T, et al. Metatranscriptomics for the Human Microbiome and Microbial Community Functional Profiling. Annual Review of Biomedical Data Science. 2021;4:279–311. https://doi.org/10.1146/annurev-biodatasci-031121-103035 (PMID: 34465175)
  2. Franzosa EA, McIver LJ, Rahnavard G, et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nature Methods. 2018;15(11):962–968. https://doi.org/10.1038/s41592-018-0176-y (PMID: 30377376)
  3. Westreich ST, Korf I, Mills DA, Lemay DG. SAMSA: a comprehensive metatranscriptome analysis pipeline. BMC Bioinformatics. 2016;17:399. https://doi.org/10.1186/s12859-016-1270-8 (PMID: 27687690)
  4. Ojala T, Häkkinen A-E, Kankuri E, Kankainen M. Current concepts, advances, and challenges in deciphering the human microbiota with metatranscriptomics. Trends in Genetics. 2023;39(9):686–702. https://doi.org/10.1016/j.tig.2023.05.004
  5. Javdan B, Lopez JG, Chankhamjon P, et al. Personalized mapping of drug metabolism by the human gut microbiome. Cell. 2020;181(7):1661–1679.e22. https://doi.org/10.1016/j.cell.2020.05.001 (PMID: 32526207)
  6. Becattini S, Sorbara MT, Kim SG, et al. Rapid transcriptional and metabolic adaptation of intestinal microbes to host immune activation. Cell Host & Microbe. 2021;29(3):378–393.e5. https://doi.org/10.1016/j.chom.2021.01.003 (PMID: 33539766)
  7. Hatch A, Horne J, Toma R, et al. A robust metatranscriptomic technology for population-scale studies of diet, gut microbiome, and human health. International Journal of Genomics. 2019;2019:1718741. https://doi.org/10.1155/2019/1718741 (PMID: 31662956)
  8. Blanco-Míguez A, Beghini F, Cumbo F, et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nature Biotechnology. 2023;41(4):555–568. https://doi.org/10.1038/s41587-023-01688-w (PMID: 36823356)
  9. Kopylova E, Noé L, Touzet H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics. 2012;28(24):3211–3217. https://doi.org/10.1093/bioinformatics/bts611 (PMID: 23071270)
  10. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biology. 2019;20(1):257. https://doi.org/10.1186/s13059-019-1891-0 (PMID: 31779668)
  11. Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science. 2017;3:e104. https://doi.org/10.7717/peerj-cs.104
  12. Mallick H, Rahnavard G, McIver LJ, et al. Multivariable association discovery in population-scale meta-omics studies. PLOS Computational Biology. 2021;17(11):e1009442. https://doi.org/10.1371/journal.pcbi.1009442 (PMID: 34784344)
  13. Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nature Communications. 2020;11:3514. https://doi.org/10.1038/s41467-020-17041-7 (PMID: 32665548)
  14. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods. 2017;14(4):417–419. https://doi.org/10.1038/nmeth.4197 (PMID: 28263959)
  15. Meyer F, Fritz A, Deng Z-L, et al. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nature Methods. 2022;19(4):429–440. https://doi.org/10.1038/s41592-022-01431-4 (PMID: 35396482)
  16. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–3048. https://doi.org/10.1093/bioinformatics/btw354 (PMID: 27312411)
  17. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. https://doi.org/10.1093/bioinformatics/bty624 (PMID: 30423086)

Let's Talk About Your Science

Tell us:

  • • Your biological question
  • • Data type and size
  • • Timeline constraints

We'll tell you:

  • • What's feasible
  • • How long it will take
  • • Exactly what it will cost
Contact Us

Contact us to start with a free consultation. Need everyday bench calculators? Try our free lab tools.