Bioinformatics analysis service

Proteomics Analysis Services — Reproducible Quantification from DDA, DIA, Single-Cell MS, and Olink Assays

Proteomics analysis measures protein identity, abundance, and modification at scale using mass spectrometry (MS) or targeted immunoassays. Pepkio's proteomics analysis service delivers version-pinned pipelines, QC reports, differential-abundance tables, PDF/SVG figures, documented R and Python code, and a Methods draft for academic, biotech, and pharma teams. Custom and bespoke workflows—non-standard inputs, outputs, or analyses beyond standard differential-protein testing—are scoped at kickoff.

Key facts

Key facts about proteomics analysis
Fact	Value
Data types supported	Thermo `.raw`, Bruker `.d`, SCIEX `.wiff` (converted to mzML when needed), mzML/mzXML; label-free quantification (LFQ), TMT/iTRAQ isobaric tags, SILAC, data-dependent acquisition (DDA), data-independent acquisition (DIA); Olink Normalized Protein eXpression (NPX) exports; pre-processed protein abundance matrices
Reference builds or standards used	UniProt Swiss-Prot reviewed proteomes (e.g., human UP000005640); CPTAC-style contaminant sequences; organism-specific FASTA on request; Olink panel lot validation reports
Primary tools (with versions)	MaxQuant 2.8.0.0; FragPipe 22+ with MSFragger 4.x; DIA-NN 2.0+; Perseus 2.0; MSstats 2.14; proDA 1.0; Philosopher; limma 3.60; proBatch 1.0
Typical turnaround range	2–4 weeks (standard LFQ or DIA cohort, one contrast); 4–8 weeks (TMT multiplex, single-cell MS, multi-panel Olink, phosphoproteomics extensions) — confirmed at kickoff
Deliverable formats	Protein and peptide matrices (`.tsv`, `.csv`); Perseus and MSstats contrast tables; PDF/SVG volcano plots, heatmaps, and PCA; HTML QC report; commented R/Python scripts; Methods draft
Regulatory/reproducibility standards followed	1% peptide and protein false discovery rate (FDR) defaults; version-pinned conda or R environments; full parameter logs; PRIDE/ProteomeXchange deposition support on request
Custom / bespoke analysis	Non-standard inputs, outputs, contrasts, enrichment methods, proteogenomic integration, or platform-specific extensions scoped at kickoff

Key terms: DDA (data-dependent acquisition) selects precursor ions for fragmentation in real time; DIA (data-independent acquisition) fragments all precursors within defined m/z windows. PEA (proximity extension assay) is Olink's dual-antibody immunoassay with DNA barcode readout. NPX is Olink's log₂-scaled relative quantification unit. MaxLFQ is MaxQuant's label-free quantification algorithm. MNAR (missing not at random) and MCAR (missing completely at random) describe different missing-value mechanisms in proteomics matrices (Lazar et al., 2016).

What is proteomics?

Proteomics is the large-scale identification and quantification of proteins in biological samples, answering which proteins are present, how abundant they are, and how they are modified—questions that mRNA measurements alone cannot fully address. Unlike transcriptomics, proteomics captures translation, protein stability, and post-translational modifications directly. The HUPO Human Proteome Project reported that 18,138 of 19,411 GENCODE protein-coding genes (93%) have been credibly detected at the protein level, with 1,273 "missing proteins" still lacking high-stringency evidence (Adkins et al., 2024). Global investment in proteomics technologies reached $27.6 billion in 2024 (BCC Research, 2025).

What proteomics analysis can answer

Published examples of biological questions proteomics resolves—often where RNA-seq alone is insufficient (Clark et al., 2019):

Which proteins are dysregulated in ccRCC beyond RNA-seq? Clark et al. (2019): 820 differential proteins in 103 tumors vs 80 normal adjacent tissues, including oxidative phosphorylation changes visible only at the protein level.
How heterogeneous are macrophage proteomes during differentiation? Specht et al. (2021): 3,042 proteins across 1,490 single cells (SCoPE2), revealing a continuous proteome gradient.
Which plasma proteins associate with incident coronary events? Wigren et al. (2016): decreased stem cell factor before events, measured by proximity extension assay.
Which phospho-signaling modules co-vary with genomic instability? Clark et al. (2019): phosphoproteomics linked kinase modules to a genomic-instability ccRCC subtype.
Can multiplex immunoassays detect neuroinflammatory CSF signatures? Bäckryd et al. (2017): distinct CSF vs plasma inflammatory profiles in fibromyalgia using a multiplex immunoassay panel.

Services included in this category

Proteomics services offered by Pepkio
Service	Description	Primary tools
DDA/DIA proteomics	Label-free, TMT/iTRAQ, or DIA quantification from LC-MS/MS raw files through peptide identification, protein inference, and differential abundance testing	MaxQuant, FragPipe, DIA-NN, MSstats
Single-cell proteomics	Carrier-based single-cell MS quantification with cell-state clustering and QC metrics aligned to community guidelines	SCoPE2-style workflows, DIA-NN, custom QC per Deutsch et al. (2023)
Olink proximity extension	Targeted multiplex immunoassay analysis from NPX exports with plate normalization, QC flagging, and differential protein testing	Olink NPX pipeline, bridge-sample normalization, limma, MSstats

What Pepkio delivers

Every project returns processed data tables, QC-audited figures, version-pinned code, and a Methods draft—plus reviewer clarification support after delivery.

Processed data

Protein/peptide matrices (`.csv`, `.tsv`); Perseus or MSstats contrast tables; imputation strategy in `analysis_parameters.yaml`

Figures (PDF/SVG)

Volcano plots per contrast; batch-diagnostic PCA and correlation heatmaps

Code and documentation

Commented R/Python scripts with conda or `renv` locks; HTML QC report; README; Methods draft—you retain full ownership

Support

Milestone check-ins; reviewer clarification and minor revisions within agreed scope (typically ≤20% of deliverables); GitHub/Zenodo archival on request

How the analysis works — step by step

1. Validate inputs and experimental design
Confirm raw file integrity, metadata, contrasts, and batch structure; flag confounded designs before processing (Čuklina et al., 2020).
Tools and outputs
Output: sample_manifest.csv
2. Convert and inspect raw data
Thermo and Bruker MS files load directly; SCIEX `.wiff` converts to mzML when needed. Run-level MS QC; Olink NPX control checks (Assarsson et al., 2014).
Tools and outputs
Tools used: MaxQuant 2.8.0.0 or FragPipe 22+
3. Identify peptides and infer proteins
Search UniProt with decoys; filter at 1% FDR (Cox & Mann, 2008).
Tools and outputs
Tools used: MaxQuant/Andromeda or MSFragger + Philosopher
4. Quantify protein abundances
MaxLFQ (LFQ), MaxQuant isobaric quantification (TMT/iTRAQ), or DIA-NN (DIA) per design (Cox et al., 2014; Yu et al., 2023).
Tools and outputs
Tools used: MaxQuant, FragPipe, DIA-NN 2.0+
5. Normalize and diagnose batch effects
PCA separates condition from batch; proBatch quantifies batch variance (Biskup et al., 2021).
Tools and outputs
Tools used: Perseus 2.0, proBatch 1.0, limma 3.60
6. Filter proteins and handle missing values
Exclude low-detection proteins; MNAR/MCAR patterns guide imputation or complete-case analysis (Lazar et al., 2016).
Tools and outputs
Tools used: Perseus 2.0, proDA 1.0
7. Test differential protein abundance
MSstats, limma, or proDA models contrasts with batch/covariates where permitted (Choi et al., 2014; Ritchie et al., 2015).
Tools and outputs
Output: dep_results_<contrast>.csv
8. Run pathway enrichment and interpret results
GO/KEGG over-representation on significant proteins (Tyanova et al., 2016).
Tools and outputs
Tools used: Perseus 2.0
9. Package figures, scripts, and Methods draft
Assemble figures, tables, scripts, README, and Methods draft into the final bundle.

Tools and standards we use

Core stack for MS and Olink projects; versions pinned at kickoff and cited in the Methods draft.

Proteomics tools and standards
Tool	Version	Role	Primary citation
MaxQuant	2.8.0.0	DDA/LFQ/TMT identification and MaxLFQ quantification	Cox & Mann, 2008 — https://doi.org/10.1038/nbt.1511
MSFragger (FragPipe)	4.x / 22+	Fast peptide search	Kong et al., 2017 — https://doi.org/10.1038/nmeth.4256
DIA-NN	2.0+	DIA precursor and protein quantification	Demichev et al., 2020 — https://doi.org/10.1038/s41592-020-00998-0
Philosopher	5.x	FDR filtering and protein inference (FragPipe)	da Veiga Leprevost et al., 2020 — https://doi.org/10.1038/s41596-020-0356-x
Perseus	2.0	Normalization, imputation, enrichment, visualization	Tyanova et al., 2016 — https://doi.org/10.1038/nmeth.3901
MSstats	2.14	MS-aware differential abundance modeling	Choi et al., 2014 — https://doi.org/10.1093/bioinformatics/btu305
proDA	1.0	Probabilistic dropout modeling for MNAR data	Korkmaz et al., 2020 — https://doi.org/10.1038/s41592-020-00949-9
limma	3.60	Linear models for Olink NPX and matrix-based DE	Ritchie et al., 2015 — https://doi.org/10.1093/nar/gkv007
proBatch	1.0	Batch-effect diagnostics and correction	Biskup et al., 2021 — https://doi.org/10.15252/msb.202110232
Olink PEA platform	Panel-specific	Multiplex immunoassay basis for NPX QC and normalization	Assarsson et al., 2014 — https://doi.org/10.1371/journal.pone.0095192

Common challenges — and how we handle them

Missing values, batch effects, and platform heterogeneity require method choices documented before differential testing.

Missing values with different biological meaning: MNAR and MCAR missingness inflate false positives if imputed blindly (Lazar et al., 2016). Pepkio profiles missingness and documents imputation or complete-case strategy before testing.
Batch effects across MS acquisition runs: Column changes, instrument drift, and prep dates introduce non-biological variance (Čuklina et al., 2020). Batch diagnostics precede modeling; batch-in-design or proBatch correction applies only when batch is not confounded with condition.
mRNA–protein discordance: Translation, degradation, and localization decouple transcript and protein abundance (Clark et al., 2019). Proteogenomic correlation reports can be scoped at kickoff when matched RNA is available.
Vendor-specific raw file formats: Thermo `.raw` and Bruker `.d` load in MaxQuant or FragPipe; SCIEX `.wiff` converts to mzML before search when needed.
Over-stringent protein inference filtering: Aggressive grouping can collapse isoforms. Grouping rules are documented in the Methods draft; strict and expanded protein lists available for reviewer sensitivity analyses.

Common questions

What data do I need to provide for a proteomics analysis project?

Provide raw MS files (`.raw`, `.d`, `.wiff`, or mzML), a sample metadata table with condition and batch columns, and the organism plus UniProt database preference. Olink projects require NPX export files and the panel lot report. Pre-processed protein matrices are accepted when upstream search parameters are documented. Custom input formats and non-standard metadata structures are supported when scoped at kickoff.

How long does proteomics analysis take at Pepkio?

Standard label-free or DIA projects (10–30 samples, one contrast) typically complete in 2–4 weeks; TMT, single-cell MS, multi-panel Olink, or phosphoproteomics extensions may require 4–8 weeks. Milestone check-ins occur after QC and before delivery; exact timelines are confirmed at kickoff.

What do the deliverables look like?

You receive protein and peptide matrices, differential-abundance tables, volcano plots and PCA figures (PDF/SVG), an HTML QC report, commented R or Python scripts with conda or `renv` lock files, a README, and a Methods draft. Optional GitHub or Zenodo archival is available on request.

Can you handle Thermo, Bruker, or SCIEX raw files?

Yes. Pepkio processes Thermo `.raw` and Bruker `.d` files in MaxQuant 2.8.0.0 or FragPipe 22+ with MSFragger. SCIEX `.wiff` files are converted to mzML before search when needed; mzML and mzXML are also accepted. The Methods draft records the search engine, database version, and FDR thresholds applied.

What if my data quality is poor or missing values are high?

Pepkio documents identification rates, missing-value fractions, and outlier samples in the QC report before differential testing. High missingness triggers a review of whether MNAR-aware methods (proDA) or Perseus imputation or complete-case analysis is more appropriate (Lazar et al., 2016; Tyanova et al., 2016). Re-analysis after sample re-measurement is discussed when QC threatens the study question.

Do you provide the code, and can I reproduce the results?

Yes. You retain full ownership of all code and results. Pepkio delivers commented R and Python scripts with conda or `renv` lock files listing exact package versions. Scripts are organized by pipeline stage with a README describing execution from raw inputs to final tables.

Can I be involved during the analysis?

Yes. Checkpoint reviews occur after raw-data QC, after normalization and batch diagnostics, and before final delivery. You can review contrast definitions, covariate inclusion, imputation choices, and FDR thresholds within agreed scope. A dedicated scientific contact leads the project and is available for scheduled calls.

What happens if a reviewer requests changes after delivery?

Post-delivery support covers clarification of methods, QC thresholds, and minor figure or table revisions within agreed scope (typically ≤20% of deliverables). Substantial new analyses—additional contrasts, re-imputation with alternate methods, or proteogenomic extensions—are scoped as separate milestones, consistent with Pepkio's reviewer-support policy.

Can you analyze Olink NPX files without re-running the assay?

Yes. Pepkio imports Olink NPX Signature exports, applies per-sample QC flagging (incubation and detection controls), and performs between-plate normalization using bridge samples or intensity normalization as appropriate for the study design (Assarsson et al., 2014). Differential testing uses limma or MSstats with the normalization method documented in the Methods draft.

Do you support custom analyses outside standard differential-protein workflows?

Yes. Bespoke analyses—custom contrasts, proteogenomic integration, phosphoproteomics extensions, ML classifiers, or non-standard deliverables—are scoped at kickoff after a feasibility review.

Related services

Transcriptomics — Proteogenomic integration when matched RNA-seq and proteomics data are available from the same samples.
Metabolomics — Joint pathway interpretation linking protein abundance changes to metabolite-level phenotypes.
Multi-omics integration — Cross-layer modeling when proteomics, transcriptomics, and metabolomics datasets share sample IDs.
Biomarker discovery — Supervised classifier development and validation on protein panels from MS or Olink data.
Bioinformatics consulting — Experimental design, replicate planning, and platform selection before sample preparation and data acquisition.

References

Adkins JN, Baker MS, Bairoch A, et al. The 2024 report on the human proteome from the HUPO Human Proteome Project. Journal of Proteome Research. 2024. https://doi.org/10.1021/acs.jproteome.4c00776 (PMID: 39514846)
Assarsson E, Lundberg M, Holmquist G, et al. Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability. PLOS ONE. 2014;9(4):e95192. https://doi.org/10.1371/journal.pone.0095192 (PMID: 24755770)
Bäckryd E, Tanum L, Lind AL, Larsson A, Gordh T. Evidence of both systemic inflammation and neuroinflammation in fibromyalgia patients, as assessed by a multiplex protein panel applied to the cerebrospinal fluid and to plasma. Journal of Pain Research. 2017;10:515–525. https://doi.org/10.2147/JPR.S128508 (PMID: 28424559)
BCC Research. Proteomics: Technologies and Global Markets. May 2025. https://www.bccresearch.com/market-research/biotechnology/proteomics-technologies-and-global-markets.html
Biskup K, Čuklina J, Mehnert M, et al. Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial. Molecular Systems Biology. 2021;17(8):e10232. https://doi.org/10.15252/msb.202110232 (PMID: 34432947)
Choi M, Chang CY, Clough T, et al. MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics. 2014;30(17):2524–2526. https://doi.org/10.1093/bioinformatics/btu305 (PMID: 24794931)
Clark DJ, Dhanasekaran SM, Petralia F, et al. Integrated proteogenomic characterization of clear cell renal cell carcinoma. Cell. 2019;179(4):964–983.e31. https://doi.org/10.1016/j.cell.2019.10.007 (PMID: 31675502)
Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nature Biotechnology. 2008;26(12):1367–1372. https://doi.org/10.1038/nbt.1511 (PMID: 19029910)
Cox J, Hein MY, Luber CA, Paron I, Nagaraj N, Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Molecular & Cellular Proteomics. 2014;13(9):2513–2526. https://doi.org/10.1074/mcp.M113.031591 (PMID: 24942700)
Čuklina J, Pedrioli PGA, Aebersold R. Review of batch effects prevention, diagnostics, and correction approaches. Methods in Molecular Biology. 2020;2051:373–387. https://doi.org/10.1007/978-1-4939-9744-2_16 (PMID: 31552638)
da Veiga Leprevost F, Haynes SE, Avtonomov DM, et al. Philosopher: a versatile toolkit for shotgun proteomics data analysis. Nature Methods. 2020;17(9):869–870. https://doi.org/10.1038/s41596-020-0356-x (PMID: 32669682)
Demichev V, Messner CB, Vernardis SI, Lilley KS, Ralser M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nature Methods. 2020;17(1):41–44. https://doi.org/10.1038/s41592-020-00998-0 (PMID: 31768060)
Deutsch EW, Orsburn BC, Aebersold R, et al. Initial recommendations for performing, benchmarking and reporting single-cell proteomics experiments. Nature Methods. 2023;20(3):375–386. https://doi.org/10.1038/s41592-023-01785-3 (PMID: 36864200)
Kong AT, Leprevost FV, Avtonomov DM, Mellachernou D, Nesvizhskii AI. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nature Methods. 2017;14(5):513–520. https://doi.org/10.1038/nmeth.4256 (PMID: 28394336)
Korkmaz S, Cox J, Mann M. proDA: a powerful and flexible tool for identifying differentially abundant proteins in label-free mass spectrometry data. Nature Methods. 2020;17(5):437–440. https://doi.org/10.1038/s41592-020-00949-9
Lazar C, Gatto L, Ferro M, Bruley C, Burger T. Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. Journal of Proteome Research. 2016;15(4):1116–1125. https://doi.org/10.1021/acs.jproteome.5b00981 (PMID: 26906401)
Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015;43(7):e47. https://doi.org/10.1093/nar/gkv007 (PMID: 25605792)
Specht H, Emmott E, Petelski AA, et al. Single-cell proteomic and transcriptomic analysis of macrophage heterogeneity using SCoPE2. Genome Biology. 2021;22(1):50. https://doi.org/10.1186/s13059-021-02267-5 (PMID: 33504367)
Tyanova S, Temu T, Sinitcyn P, et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nature Methods. 2016;13(9):731–740. https://doi.org/10.1038/nmeth.3901 (PMID: 27348712)
Wigren M, Söderberg S, Bäckryd E, et al. Decreased levels of stem cell factor in subjects with incident coronary events. Journal of Internal Medicine. 2016;279(2):180–191. https://doi.org/10.1111/joim.12443 (PMID: 26467529)
Yu F, Teo GC, Kong AT, et al. Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform. Nature Communications. 2023;14:4154. https://doi.org/10.1038/s41467-023-39869-5 (PMID: 37438352)

Individual services

Deep-dive pages for specific proteomics methods and workflows.

Let's Talk About Your Science

Tell us:

• Your biological question
• Data type and size
• Timeline constraints

We'll tell you:

• What's feasible
• How long it will take
• Exactly what it will cost

Proteomics Analysis Services — Reproducible Quantification from DDA, DIA, Single-Cell MS, and Olink Assays

Key facts

What is proteomics?

What proteomics analysis can answer

Services included in this category

What Pepkio delivers

Processed data

Figures (PDF/SVG)

Code and documentation

Support

How the analysis works — step by step

1. Validate inputs and experimental design

2. Convert and inspect raw data

3. Identify peptides and infer proteins

4. Quantify protein abundances

5. Normalize and diagnose batch effects

6. Filter proteins and handle missing values

7. Test differential protein abundance

8. Run pathway enrichment and interpret results

9. Package figures, scripts, and Methods draft

Tools and standards we use

Common challenges — and how we handle them

Common questions

Related services

Let's Talk About Your Science