Bioinformatics analysis service

Metabolomics Analysis Services — Version-Pinned LC-MS and GC-MS Workflows from Raw Spectra to Annotated Features

Metabolomics measures small-molecule abundance in biofluids, cells, or tissues to profile biochemical state at the phenotype level. Pepkio's metabolomics analysis service delivers version-pinned pipelines from raw mzML or vendor exports to annotated feature tables, QC reports, PDF/SVG figures, documented R code, and a Methods draft for academic, biotech, and pharma teams. Custom and bespoke workflows—non-standard inputs, outputs, or analyses beyond standard untargeted differential-feature testing—are scoped at kickoff.

Key facts

Key facts about metabolomics analysis
Fact	Value
Data types supported	LC-MS/MS (Thermo `.raw`, SCIEX `.wiff`, Agilent `.d`, Bruker `.d`, Waters `.raw`, mzML/mzXML); GC-MS and vendor peak tables on request; NMR bucket tables on request
Reference builds or standards used	HMDB 5.0; METLIN; KEGG; LipidMaps; GNPS spectral libraries; organism-specific in-house libraries on request
Primary tools (with versions)	xcms 4.2.3; MS-DIAL 5.1; MZmine 3.9; MetaboAnalystR 4.0; limma 3.60; ropls 1.32; MOFA2 1.12; mixOmics 6.28 — see Tools table
Typical turnaround range	2–4 weeks (standard untargeted LC-MS cohort, one contrast); 4–8 weeks (GC-MS extensions, multi-omics integration, or multi-batch correction) — confirmed at kickoff
Deliverable formats	Feature/intensity matrices (`.csv`, `.tsv`); annotation tables with confidence tiers; PDF/SVG PCA, PLS-DA, and volcano plots; pathway enrichment tables; HTML QC report; commented R scripts; Methods draft
Regulatory/reproducibility standards followed	Alseekh et al. MS metabolomics reporting guidelines; version-pinned conda or `renv` environments; sample manifests; optional private Git or Zenodo archival on request
Custom / bespoke analysis	Non-standard inputs, contrasts, cross-cohort harmonization, multi-omics extensions, or client-specified ML models scoped at kickoff

Key terms: A feature is an m/z–retention time (RT) pair before metabolite assignment; MS2 (tandem MS) fragments precursors for structural annotation. Untargeted profiling detects all features; targeted assays quantify predefined compounds. A QC sample (pooled reference injected repeatedly) monitors drift. Annotation level follows tiers from MS/MS match to putative formula (Alseekh et al., 2021).

What is metabolomics?

Metabolomics is the large-scale measurement of small molecules—amino acids, lipids, organic acids, and other metabolites—in biological samples. Because metabolites sit closest to cellular phenotype, metabolomics answers which biochemical pathways are perturbed under disease, treatment, or environmental exposure (Reinhold et al., 2019). Untargeted liquid chromatography–mass spectrometry (LC-MS) can detect thousands of features per run, though confident annotation remains a bottleneck (Alseekh et al., 2021). HMDB 5.0 catalogs 217,920 metabolite entries for human studies (Wishart et al., 2022).

What metabolomics analysis can answer

Published examples of biological questions metabolomics resolves:

Which serum metabolites distinguish metastatic colorectal cancer from healthy controls? Martín-Blázquez et al. (2019): untargeted LC-HRMS on 65 metastatic CRC and 60 control sera identified altered sphingolipids, glycerophospholipids, and an endocannabinoid.
Which urinary metabolites associate with type 2 diabetes? Salihovic et al. (2020): UPLC-MS in 1,424 Swedish adults linked lower 3-hydroxyundecanoyl-carnitine to prevalent T2D; six metabolites improved incident prediction (C-statistic 0.866 to 0.892).
How do metabolites shift with microbiome and host transcriptome in IBD? Lloyd-Price et al. (2019): 132 longitudinal IBD subjects with integrated metabolomics, metagenomics, and host transcriptomics linked metabolic shifts to disease activity.
Which pancreatic cancer metabolite biomarkers replicate across studies? Roth & Powers (2022): 24 PDAC studies; 10 metabolites validated across cohorts, but 87% of 655 proposed biomarkers appeared in only one study.
When should metabolomics complement transcript or protein layers in ccRCC? Clark et al. (2019): proteogenomic profiling of 103 tumors and 80 normal tissues showed oxidative phosphorylation dysregulation at the protein level not apparent from RNA-seq—motivating matched metabolomics when sample IDs align.

Services included in this category

Pepkio's metabolomics category covers untargeted LC-MS/GC-MS profiling and multi-omics integration—each with a dedicated spoke page for inputs, tools, and deliverables.

Metabolomics services offered by Pepkio
Service	Description	Primary tools
Untargeted metabolomics	Peak detection, alignment, normalization, annotation, and differential feature testing from LC-MS or GC-MS raw data or vendor peak tables	xcms 4.2.3, MS-DIAL 5.1, MZmine 3.9, MetaboAnalystR 4.0, limma 3.60
Multi-omics integration	Joint modeling of metabolomics with transcriptomics, proteomics, or metagenomics on matched samples to identify shared pathways and cross-layer biomarkers	MOFA2 1.12, mixOmics DIABLO 6.28, correlation and WGCNA modules on request

What Pepkio delivers

Every project returns processed data tables, QC-audited figures, version-pinned code, and a Methods draft.

Processed data and figures

`feature_matrix.csv`, `annotation_table.csv`, `diff_results_<contrast>.csv`; PCA/PLS-DA and volcano plots (PDF/SVG); pathway enrichment tables with Alseekh-aligned annotation tiers

QC, code, and support

`sample_manifest.csv`, `qc_summary.html`, `analysis_parameters.yaml`; commented R scripts with conda/`renv` locks, README, Methods draft; milestone check-ins; reviewer clarification within agreed scope (typically ≤20% of deliverables)

Non-standard contrasts or figures are scoped at kickoff.

How the analysis works — step by step

1. Scope study design and select preprocessing pipeline
Confirm question, matrix, platform, contrasts, and QC-sample design; flag confounded batch (Alseekh et al., 2021; Reinhold et al., 2019).
Tools and outputs
Output: signed scope with pipeline choice (xcms, MS-DIAL, or MZmine)
2. Validate inputs and record metadata
Verify file integrity, polarity, and sample IDs; convert vendor formats to mzML when needed; confirm pooled QC injections.
Tools and outputs
Tools used: msconvert (ProteoWizard 3.0.20360)
Output: sample_manifest.csv
3. Run raw spectral QC
Inspect TIC and base-peak chromatograms, mass accuracy, and blank flags.
Tools and outputs
Tools used: MetaboAnalystR 4.0 QC module (Pang et al., 2024)
Output: raw_qc_report.html
4. Detect, align, and integrate peaks
Extract features with consistent m/z and RT tolerances; gap-fill missing peaks.
Tools and outputs
Tools used: xcms 4.2.3 (Smith et al., 2006); MS-DIAL 5.1 (Tsugawa et al., 2015); or MZmine 3.9 (Heuckeroth et al., 2024)
Output: peak_matrix.csv
5. Normalize and diagnose batch and run-order effects
Apply QC-RLSC or probabilistic quotient normalization; review PCA for condition versus batch (Reinhold et al., 2019).
Tools and outputs
Tools used: MetaboAnalystR 4.0; limma `removeBatchEffect` when batch is not confounded with condition (Ritchie et al., 2015)
Output: normalized_matrix.csv; PCA plots
6. Filter features and handle missing values
Remove low-abundance and blank-contaminated features; profile missingness by batch; document imputation before testing (Reinhold et al., 2019; Goh et al., 2023).
Tools and outputs
Output: filtered_matrix.csv; missingness heatmap
7. Test differential abundance and multivariate structure
Model contrasts with limma; run PLS-DA or OPLS-DA when scoped (Ritchie et al., 2015; Thévenot et al., 2015; Eriksson et al., 2008).
Tools and outputs
Tools used: limma 3.60; ropls 1.32
Output: diff_results_<contrast>.csv; cross-validated PLS-DA metrics
8. Annotate features and run pathway enrichment
Match spectra against HMDB, METLIN, and KEGG; assign annotation tiers; run pathway over-representation or MSEA (Alseekh et al., 2021; Pang et al., 2024).
Tools and outputs
Tools used: MetaboAnalystR 4.0; SIRIUS/CSI:FingerID on request
Output: annotation_table.csv; pathway_enrichment.csv
9. Package figures, scripts, and Methods draft
Assemble deliverables with pinned versions; add MOFA2 or DIABLO when matched omics matrices are in scope (Argelaguet et al., 2018; Singh et al., 2019).
Tools and outputs
Output: final deliverable bundle

Tools and standards we use

Core stack for untargeted and multi-omics projects; versions pinned at kickoff and cited in the Methods draft.

Metabolomics tools and standards
Tool	Version	Role	Primary citation
xcms	4.2.3	LC-MS/GC-MS peak detection, alignment, and gap filling	Smith et al., 2006 — https://doi.org/10.1021/ac051437y
MS-DIAL	5.1	Vendor-agnostic peak picking, MS/MS deconvolution, and annotation	Tsugawa et al., 2015 — https://doi.org/10.1038/nmeth.3259
MZmine	3.9	Modular LC-MS/GC-MS processing and lipidomics workflows	Heuckeroth et al., 2024 — https://doi.org/10.1038/s41596-024-00996-y
MetaboAnalystR	4.0	Unified LC-MS preprocessing, statistics, and pathway analysis	Pang et al., 2024 — https://doi.org/10.1038/s41467-024-48009-6
limma	3.60	Linear models for differential feature testing; batch adjustment via `removeBatchEffect`	Ritchie et al., 2015 — https://doi.org/10.1093/nar/gkv007
ropls	1.32	PLS-DA and OPLS-DA multivariate modeling	Thévenot et al., 2015 — https://doi.org/10.1021/acs.jproteome.5b00354
MOFA2	1.12	Unsupervised multi-omics factor integration	Argelaguet et al., 2018 — https://doi.org/10.15252/msb.20178124
mixOmics (DIABLO)	6.28	Supervised multi-omics integration with outcome labels	Singh et al., 2019 — https://doi.org/10.1093/bioinformatics/bty1054
msconvert (ProteoWizard)	3.0.20360	Raw-to-mzML conversion for cross-platform ingestion	Chambers et al., 2012 — https://doi.org/10.1038/nbt.2376
HMDB	5.0	Human metabolite reference and spectral matching	Wishart et al., 2022 — https://doi.org/10.1093/nar/gkab1062

Common challenges — and how we handle them

Missing values, annotation uncertainty, and pipeline choice require documented decisions before differential testing.

Metabolite annotation confidence varies widely: MS1-only matches carry higher false-discovery risk than MS/MS-confirmed identifications (Alseekh et al., 2021). Pepkio assigns annotation tiers and reports putative versus confirmed counts in the QC summary.
Pipeline choice shifts feature sets materially: Only approximately 8% of features overlapped across four workflows (XCMS, MS-DIAL, MZmine, Compound Discoverer) on a shared LC-MS dataset (Aigensberger et al., 2025). Pepkio locks pipeline and parameters at kickoff.
Missing values confound with batch effects: Batch-associated missingness (BEAMs) can bias imputation and batch correction if handled sequentially (Goh et al., 2023; Reinhold et al., 2019). Pepkio profiles missingness by batch and applies batch adjustment only when batch is not confounded with condition.
Run-order drift and ion suppression distort quantification: Signal intensity can shift across injection order and matrix complexity (Alseekh et al., 2021; Reinhold et al., 2019). Pepkio requires pooled QC samples where feasible and applies QC-RLSC or PQN with run-order diagnostics.
Clinical biomarker claims lack cross-study replication: Meta-analysis of 24 PDAC studies found 87% of proposed biomarkers in single reports only (Roth & Powers, 2022). Pepkio recommends validation cohorts before biomarker claims.

Common questions

What data do I need to provide for a metabolomics analysis project?

Provide raw MS files (`.raw`, `.wiff`, `.d`, or mzML), metadata with condition and batch columns, and platform plus polarity mode. Include pooled QC files when available (Alseekh et al., 2021; Reinhold et al., 2019). Pre-processed peak tables from MS-DIAL or MZmine are accepted when export parameters are documented.

How long does metabolomics analysis take at Pepkio?

Standard untargeted LC-MS projects (roughly 10–40 samples, one contrast) typically complete in 2–4 weeks. GC-MS extensions, multi-omics integration, multi-contrast designs, or large batch-correction studies may require 4–8 weeks. Exact timelines are confirmed at kickoff.

What do the deliverables look like?

You receive feature and annotation tables, differential-abundance results per contrast, PCA/PLS-DA and volcano figures (PDF/SVG), an HTML QC report, commented R scripts with conda or `renv` lock files, a README, and a Methods draft. Optional GitHub or Zenodo archival is available on request.

Can you handle Thermo, SCIEX, Agilent, or Waters LC-MS data?

Yes. Thermo `.raw`, Agilent `.d`, Bruker `.d`, and Waters `.raw` are converted to mzML via msconvert or imported in MS-DIAL when scoped. SCIEX `.wiff` can be processed in MS-DIAL 5.1 when that workflow is scoped at kickoff; mzML exports are also accepted. The Methods draft records preprocessing tool, m/z tolerance, and RT alignment parameters.

What if my data quality is poor or I have many missing features?

Pepkio documents TIC quality, missing-value fractions, and outlier samples in the QC report before differential testing. High missingness triggers a review of imputation strategy—small-value, k-nearest neighbours, or complete-case analysis (Reinhold et al., 2019; Goh et al., 2023). Re-analysis after sample re-measurement is discussed when QC threatens the study question.

Do you provide the code, and can I reproduce the results?

Yes. You retain full ownership of all code and results. Pepkio delivers commented R scripts with conda or `renv` lock files listing exact package versions, organized by pipeline stage with a README.

Can I be involved during the analysis?

Yes. Checkpoint reviews occur after raw-data QC, after normalization and batch diagnostics, and before final delivery. You can review contrast definitions, normalization, imputation, and annotation thresholds within agreed scope. A dedicated scientific contact leads the project.

What happens if a reviewer requests changes after delivery?

Post-delivery support covers clarification of methods, QC thresholds, and minor figure or table revisions within agreed scope (typically ≤20% of deliverables). Substantial new analyses are scoped as separate milestones, consistent with Pepkio's reviewer-support policy.

Can you integrate metabolomics with my RNA-seq or proteomics data?

Yes, when sample IDs match across layers. Pepkio scopes multi-omics integration via MOFA2 for unsupervised factor discovery or mixOmics DIABLO when a supervised outcome label exists (Argelaguet et al., 2018; Singh et al., 2019). Cross-layer correlation and pathway concordance are documented in the Methods draft.

Do you support custom analyses outside standard untargeted workflows?

Yes. Bespoke analyses—custom contrasts, cross-cohort harmonization, lipid class-specific workflows, network propagation, ML classifiers, or non-standard deliverables—are scoped at kickoff after a feasibility review.

Related services

Transcriptomics — Pathway concordance between gene expression and metabolite abundance.
Proteomics — Protein–metabolite interpretation on matched sample IDs.
Metagenomics — Microbiome context from shotgun or 16S profiling.
Statistical analysis — Replicate planning before sample acquisition.
Machine learning — Classifier development on metabolite panels.
Bioinformatics consulting — Platform and QC-sample design before a full project.

References

Alseekh S, Aharoni A, Brotman Y, et al. Mass spectrometry-based metabolomics: a guide for annotation, quantification and best reporting practices. Nature Methods. 2021;18(7):747–756. https://doi.org/10.1038/s41592-021-01197-1 (PMID: 34239102)
Smith CA, Want EJ, O'Maille G, Abagyan R, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical Chemistry. 2006;78(3):779–787. https://doi.org/10.1021/ac051437y (PMID: 16448051)
Tsugawa H, Cajka T, Kind T, et al. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nature Methods. 2015;12(6):523–526. https://doi.org/10.1038/nmeth.3259 (PMID: 25938372)
Heuckeroth S, Damiani T, Smirnov A, et al. Reproducible mass spectrometry data processing and compound annotation in MZmine 3. Nature Protocols. 2024;19(9):2597–2641. https://doi.org/10.1038/s41596-024-00996-y (PMID: 38769143)
Pang Z, Xu L, Viau C, et al. MetaboAnalystR 4.0: a unified LC-MS workflow for global metabolomics. Nature Communications. 2024;15:3675. https://doi.org/10.1038/s41467-024-48009-6 (PMID: 38693118)
Reinhold D, Pielke-Lombardo H, Jacobson S, Ghosh D, Kechris K. Pre-analytic considerations for mass spectrometry based untargeted metabolomics data. Methods in Molecular Biology. 2019;1978:323–340. https://doi.org/10.1007/978-1-4939-9236-2_20 (PMID: 31119672)
Wishart DS, Guo AC, Oler E, et al. HMDB 5.0: the Human Metabolome Database for 2022. Nucleic Acids Research. 2022;50(D1):D622–D631. https://doi.org/10.1093/nar/gkab1062 (PMID: 34986597)
Aigensberger M, Bueschl C, Castillo-Lopez E, et al. Modular comparison of untargeted metabolomics processing steps. Analytica Chimica Acta. 2025;1336:343491. https://doi.org/10.1016/j.aca.2024.343491 (PMID: 39788662)
Martín-Blázquez A, Díaz C, González-Flores E, et al. Untargeted LC-HRMS-based metabolomics to identify novel biomarkers of metastatic colorectal cancer. Scientific Reports. 2019;9:20198. https://doi.org/10.1038/s41598-019-55952-8 (PMID: 31882610)
Salihovic S, Broeckling CD, Ganna A, et al. Non-targeted urine metabolomics and associations with prevalent and incident type 2 diabetes. Scientific Reports. 2020;10:16474. https://doi.org/10.1038/s41598-020-72456-y (PMID: 33020500)
Lloyd-Price J, Arze C, Ananthakrishnan AN, et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature. 2019;569(7758):655–662. https://doi.org/10.1038/s41586-019-1237-9 (PMID: 31142855)
Roth HE, Powers R. Meta-analysis reveals both the promises and the challenges of clinical metabolomics. Cancers. 2022;14(16):3992. https://doi.org/10.3390/cancers14163992 (PMID: 36010984)
Clark DJ, Dhanasekaran SM, Petralia F, et al. Integrated proteogenomic characterization of clear cell renal cell carcinoma. Cell. 2019;179(4):964–983.e31. https://doi.org/10.1016/j.cell.2019.10.007 (PMID: 31675502)
Goh WWB, Hui HWH, Wong L. How missing value imputation is confounded with batch effects and what you can do about it. Drug Discovery Today. 2023;28(9):103661. https://doi.org/10.1016/j.drudis.2023.103661 (PMID: 37301250)
Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015;43(7):e47. https://doi.org/10.1093/nar/gkv007 (PMID: 25605792)
Eriksson L, Trygg J, Wold S. CV-ANOVA for significance testing of PLS and OPLS models. Journal of Chemometrics. 2008;22(11–12):594–600. https://doi.org/10.1002/cem.1187
Thévenot EA, Roux A, Xu Y, et al. Analysis of the human adult urinary metabolome variations with age, body mass index, and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. Journal of Proteome Research. 2015;14(8):3322–3335. https://doi.org/10.1021/acs.jproteome.5b00354 (PMID: 26088811)
Argelaguet R, Velten B, Arnol D, et al. Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets. Molecular Systems Biology. 2018;14(6):e8124. https://doi.org/10.15252/msb.20178124 (PMID: 29925568)
Singh A, Shannon CP, Gautier B, et al. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics. 2019;35(17):3055–3062. https://doi.org/10.1093/bioinformatics/bty1054 (PMID: 30657866)
Chambers MC, Maclean B, Burke R, et al. A cross-platform toolkit for mass spectrometry and proteomics. Nature Biotechnology. 2012;30(10):918–920. https://doi.org/10.1038/nbt.2376 (PMID: 23051804)

Individual services

Deep-dive pages for specific metabolomics methods and workflows.

Let's Talk About Your Science

Tell us:

• Your biological question
• Data type and size
• Timeline constraints

We'll tell you:

• What's feasible
• How long it will take
• Exactly what it will cost

Metabolomics Analysis Services — Version-Pinned LC-MS and GC-MS Workflows from Raw Spectra to Annotated Features

Key facts

What is metabolomics?

What metabolomics analysis can answer

Services included in this category

What Pepkio delivers

Processed data and figures

QC, code, and support

How the analysis works — step by step

1. Scope study design and select preprocessing pipeline

2. Validate inputs and record metadata

3. Run raw spectral QC

4. Detect, align, and integrate peaks

5. Normalize and diagnose batch and run-order effects

6. Filter features and handle missing values

7. Test differential abundance and multivariate structure

8. Annotate features and run pathway enrichment

9. Package figures, scripts, and Methods draft

Tools and standards we use

Common challenges — and how we handle them

Common questions

Related services

Let's Talk About Your Science