Proteomics

DDA/DIA Proteomics Analysis Service — Library-Based and Library-Free Quantification from Raw LC-MS/MS to MSstats Contrasts

Data-dependent acquisition (DDA) and data-independent acquisition (DIA) LC-MS/MS quantify protein abundance across conditions. Pepkio delivers version-pinned DDA/DIA analysis from vendor raw files or protein matrices to differential-abundance tables, with custom and bespoke support scoped at kickoff. For academic, biotech, and pharma clients; a benchmark universal DDA library comprised 174,115 peptide precursors mapped to 11,725 proteins (Jiang et al., 2023). Documented scripts, figures, and a Methods draft included.

Key facts

Key facts about DDA/DIA Proteomics
Fact	Value
Supported platforms / instruments	Thermo Orbitrap (Exploris, Q Exactive, Fusion Lumos; Astral nDIA when scoped); Bruker timsTOF (dia-PASEF); SCIEX TripleTOF; ZenoTOF and Waters when scoped at kickoff; protein matrices accepted
Input requirements	Thermo `.raw`, Bruker `.d`, SCIEX `.wiff` (converted to mzML when needed), or mzML/mzXML; multiple biological replicates per condition (often ≥3) for variance estimation (Choi et al., 2014; Cairns et al., 2009); TMT/iTRAQ plex design documented at kickoff; phosphoproteomics and fractionated libraries scoped separately
Reference builds supported	UniProt Swiss-Prot reviewed proteomes (human UP000005640, mouse UP000000589, yeast UP000002311); CPTAC-style contaminant sequences; organism-specific FASTA on request
Primary tools (with versions)	MaxQuant 2.8+; FragPipe 22+ with MSFragger 4.x; Philosopher 5.x; DIA-NN 2.0+; directLFQ; Perseus 2.0; MSstats 2.14+; proDA 1.0; proBatch 1.0; limma 3.60 — pinned per project
Typical turnaround time	2–4 weeks (standard LFQ or DIA cohort, 4–24 samples, one contrast); 4–8 weeks (TMT multiplex, phosphoproteomics, multi-contrast designs, or large cohorts) — confirmed at kickoff
Deliverable formats	Protein and peptide matrices (`.tsv`, `.csv`); MSstats/proDA contrast tables; PDF/SVG volcano plots, heatmaps, and PCA; HTML QC report; commented R/Python scripts; Methods draft
Key cited best-practice reference	Yu et al. (2024), Molecular & Cellular Proteomics (DIA survey); Jiang et al. (2023), Nature Communications (DIA software benchmarking)
Custom / bespoke analysis	Non-standard inputs, outputs, contrasts, phosphoproteomics extensions, affinity proteomics, proteogenomic correlation, or client-specified search engines scoped at kickoff

What is DDA/DIA proteomics?

DDA/DIA proteomics identifies peptides from LC-MS/MS spectra and aggregates precursor or fragment intensities into sample-by-protein abundance matrices—measuring which proteins change in abundance or modification between conditions (Cox & Mann, 2008; Yu et al., 2024). DDA selects high-abundance precursors for fragmentation; DIA fragments precursors within defined *m/z* windows, improving run-to-run quantitative consistency (Yu et al., 2024). DIA-NN and Spectronaut showed distinct advantages in Orbitrap and timsTOF benchmark studies (Jiang et al., 2023). Narrow-window DIA on Orbitrap Astral quantifies ~10,000 human protein groups in a half-an-hour gradient (Mann et al., 2023). Pepkio processes both modes with documented library, FDR, and normalization choices; custom workflows are agreed at kickoff. See the DDA/DIA proteomics glossary.

When should you use DDA/DIA proteomics?

DDA/DIA proteomics fits when the research question requires direct measurement of protein abundance, post-translational modification, or pathway-level protein dysregulation—questions RNA-seq alone cannot fully answer (Clark et al., 2019).

Comparison of DDA label-free, DIA/SWATH, and TMT/iTRAQ multiplex approaches
Approach	Best for	Limitations	Approximate cost range
DDA label-free (LFQ)	Flexible discovery, phospho-enriched DDA, small cohorts, MaxQuant/FragPipe compatibility	Stochastic precursor selection; lower run-to-run reproducibility than DIA	Lower per-sample MS time; moderate bioinformatics cost
DIA / SWATH	Large cohorts, quantitative consistency, library-free workflows	Spectral library strategy affects depth; platform-specific acquisition tuning	Moderate–higher MS acquisition time; efficient per-sample quantification at scale
TMT / iTRAQ multiplex	Controlled multi-condition comparison in one MS run (typical 6–16-plex)	Ratio compression; isobaric interference; complex designs need careful planning	Higher reagent and MS cost; strong for paired, multiplexed designs

Tumor vs. normal proteome rewiring: Clark et al. (2019) profiled 103 clear cell renal cell carcinoma tumors and 80 normal adjacent tissues by TMT-based quantitative MS, identifying 820 differentially abundant proteins—including oxidative phosphorylation downregulation visible at the protein but not mRNA level (Clark et al., 2019).
High-throughput clinical-scale profiling: Mann et al. (2023) introduced narrow-window DIA (nDIA) on Orbitrap Astral, reporting nearly complete coverage of ~12,000 expressed human proteins within 3–4.5 h of analysis for large-cohort biomarker studies (Mann et al., 2023).
Low-input heterogeneity: Rosenberger et al. (2023) applied sensitivity-tailored DIA to mouse embryonic stem cells, revealing proteome subclusters with distinct metabolic enzyme expression invisible in bulk averages (Rosenberger et al., 2023).

How the analysis works — step by step

1. Validate inputs and experimental design
Pepkio confirms raw file integrity (or matrix dimensions), acquisition mode (DDA, DIA, TMT), organism, and contrast definitions. Sample IDs, biological replicate IDs, batch, and condition are recorded in sample_manifest.csv. Designs with batch fully confounded by condition or very few biological replicates are flagged before processing (Choi et al., 2014; Cairns et al., 2009).
Tools and outputs
Tools used: Custom validation scripts; MSstats annotation schema check
Output: sample_manifest.csv with raw file paths, replicate IDs, condition labels, and design notes
2. Convert and inspect raw MS data
Thermo `.raw` and Bruker `.d` files load in MaxQuant, FragPipe, or DIA-NN; SCIEX `.wiff` converts to mzML when needed. Run-level QC checks TIC stability, scan counts, and precursor distribution before search (Yu et al., 2024).
Tools and outputs
Tools used: MaxQuant 2.8+; FragPipe 22+; msconvert when required
Output: ms_qc_summary.csv; per-run QC plots; mzML when applicable
3. Build or select spectral library (DIA)
Library strategy is set at kickoff: experimental DDA library, DIA-NN library-free spectra, or hybrid DDA+DIA library (Yu et al., 2023; Jiang et al., 2023). Coverage is logged against benchmark libraries (e.g., 174,115 precursors → 11,725 proteins in Jiang et al., 2023).
Tools and outputs
Tools used: FragPipe 22+ / EasyPQP; DIA-NN 2.0+; Spectronaut when client supplies exports
Output: Spectral library (`.speclib`, `.tsv`, or native format); library_qc_report.csv
4. Identify peptides and infer proteins
Searches use UniProt Swiss-Prot with reversed decoys at 1% peptide and protein FDR (Cox & Mann, 2008; da Veiga Leprevost et al., 2020). DDA uses Andromeda or MSFragger; DIA uses DIA-NN or MSFragger-DIA (Demichev et al., 2020; Yu et al., 2023).
Tools and outputs
Tools used: MaxQuant 2.8+; MSFragger 4.x + Philosopher 5.x; DIA-NN 2.0+
Output: proteinGroups.txt / report.parquet; psm.tsv; protein_inference_log.csv
5. Quantify protein abundances
DDA uses MaxLFQ or FragPipe LFQ (Cox et al., 2014); DIA uses DIA-NN quantities, with directLFQ when design matches Zhang et al. (2024) workflows. TMT/iTRAQ uses MaxQuant reporter-ion extraction with isotope impurity correction.
Tools and outputs
Tools used: MaxQuant 2.8+; DIA-NN 2.0+; directLFQ; FragPipe 22+
Output: protein_matrix_raw.csv; peptide_matrix_raw.csv; quantification log
6. Normalize and diagnose batch effects
Detection rates, log-intensity distributions, and correlations are summarized; PCA and clustering separate condition from batch when possible. proBatch quantifies batch variance before correction (Biskup et al., 2021).
Tools and outputs
Tools used: Perseus 2.0; proBatch 1.0; custom R diagnostics
Output: sample_qc_summary.csv; PCA/correlation plots; proBatch report
7. Filter proteins and handle missing values
Low-detection proteins are filtered; MNAR versus MCAR patterns guide imputation choice (Lazar et al., 2016). Strategy is logged in analysis_parameters.yaml.
Tools and outputs
Tools used: Perseus 2.0; proDA 1.0
Output: Filtered matrix; missing_value_profile.csv; protein_filter_log.csv
8. Test differential protein abundance
MSstats, proDA, or limma fit the agreed design matrix with Benjamini–Hochberg FDR control (Choi et al., 2014; Korkmaz et al., 2020; Ritchie et al., 2015).
Tools and outputs
Tools used: MSstats 2.14+; proDA 1.0; limma 3.60
Output: dep_results_<contrast>.csv (ProteinID, Gene, log2FC, SE, DF, pvalue, adj.pvalue)
9. Run pathway enrichment and interpret results
Significant proteins are tested for GO/KEGG over-representation; dot plots summarize enriched terms (Tyanova et al., 2016).
Tools and outputs
Tools used: Perseus 2.0
Output: go_enrichment.csv, kegg_enrichment.csv; enrichment plots (PDF/SVG)
10. Package figures, scripts, and Methods draft
Figures, scripts, README, and Methods draft are bundled; phospho, affinity, and Spectronaut-native extensions are scoped separately (Yu et al., 2024).
Tools and outputs
Tools used: Perseus 2.0; MSstats 2.14+; custom R/Python scripts
Output: PDF/SVG bundle; scripts; README; Methods draft

What Pepkio delivers

Processed data

Raw and filtered protein and peptide matrices (`.csv`, `.tsv`)
MSstats or proDA contrast outputs
Spectral library files for DIA projects
analysis_parameters.yaml documenting FDR, imputation, and normalization choices

Figures (PDF/SVG)

MS run QC (TIC, precursor counts)
Sample PCA, hierarchical clustering, and correlation heatmap
Missing-value pattern plot
Volcano plot per contrast; top-DEP heatmap
GO/KEGG enrichment dot plots

Tables

sample_manifest.csv, ms_qc_summary.csv, sample_qc_summary.csv
protein_filter_log.csv, missing_value_profile.csv
dep_results_<contrast>.csv, go_enrichment.csv, kegg_enrichment.csv

Code

Commented R or Python scripts per stage
Environment lock files (sessionInfo(), conda export, or renv lock)
Delivery via private Git repository or agreed file transfer

Documentation

HTML QC report with thresholds and outlier flags
README with reproduction instructions
Methods draft citing software versions and search parameters
Post-delivery reviewer support for clarification and minor revisions within agreed scope (typically ≤20% of deliverables)

Technical decisions we make — and why

DDA search engine: MaxQuant default; FragPipe + MSFragger on request: Integrated Andromeda, MaxLFQ, and TMT (Cox & Mann, 2008; Cox et al., 2014); FragPipe for open-source search and MSFragger-DIA libraries (Kong et al., 2017; Yu et al., 2023).
DIA analysis: DIA-NN library-free default; Spectronaut when client supplies export: Distinct advantages in Jiang et al. (2023) benchmarks; library-free mode per Demichev et al. (2020).
Quantification: MaxLFQ (DDA) / DIA-NN; directLFQ optional: MaxLFQ for DDA (Cox et al., 2014); Zhang et al. (2024) recommend directLFQ from DIA-NN without extra normalization for benchmarked DIA DE workflows.
Missing values: profile MNAR; proDA or MinDet before blind imputation: MNAR inflates false positives with naive imputation (Lazar et al., 2016); proDA or MinDet per Zhang et al. (2024) and Korkmaz et al. (2020).
Batch handling: batch in design when not confounded: proBatch diagnostics before modeling (Biskup et al., 2021; Choi et al., 2014).

Common questions

What is the minimum sample size and replicate count for DDA/DIA proteomics?

Multiple biological replicates per condition—often three or more—are recommended to estimate protein-level variance for differential testing (Choi et al., 2014; Cairns et al., 2009). MSstats `designSampleSize` can estimate replicate count or power from pilot data for a target fold change. Pepkio can analyze smaller designs but documents reduced statistical power in the QC report. Sample-size targets are confirmed at kickoff.

Can you analyze low-quality or low-yield MS runs?

Yes, with caveats documented in the QC report. Runs with low PSM counts, unstable TIC, or poor precursor coverage are flagged before differential testing. Outlier samples in PCA are discussed with you; re-acquisition is recommended when yield threatens the study question. Sensitivity-optimized DIA acquisition may be relevant for low-input samples when instrument settings are available (Rosenberger et al., 2023).

Do you support Thermo Orbitrap, Bruker timsTOF, and SCIEX data?

Yes, for formats we can load after kickoff review. Thermo `.raw` files from Orbitrap Exploris, Q Exactive, Fusion Lumos, and Astral load in MaxQuant, FragPipe, and DIA-NN. Bruker timsTOF `.d` files support dia-PASEF workflows in DIA-NN. SCIEX TripleTOF `.wiff` files convert to mzML before search when needed. ZenoTOF and Waters SYNAPT/Xevo data are supported when scoped at kickoff.

How long does DDA/DIA proteomics analysis take at Pepkio?

A standard project (roughly 4–24 samples, one primary contrast, LFQ or DIA quantification, and GO/KEGG enrichment) typically completes in 2–4 weeks from data receipt. TMT multiplex designs, phosphoproteomics, multi-contrast studies, or cohorts exceeding 48 samples may require 4–8 weeks. Milestone check-ins occur during the project; exact timelines are confirmed at kickoff.

How do you handle batch effects across MS acquisition runs or prep dates?

When batch is known and not fully confounded with condition, Pepkio includes batch in the MSstats or limma design formula (Choi et al., 2014). PCA and proBatch variance reports are reviewed before modeling. Post-hoc batch correction with proBatch or other correction methods is scoped separately when the design requires it (Biskup et al., 2021).

Do I own the code — and in what format is it delivered?

Yes — you retain full ownership of all code, scripts, and results delivered under the project agreement. Pepkio provides commented R or Python scripts with environment lock files so you can rerun analyses when the execution environment matches the pinned setup. Matrices use standard `.csv` and `.tsv` formats; R Markdown delivery is available on request.

Can I be involved during analysis?

Yes. Checkpoint reviews occur after MS QC, after protein matrix audit, and before final delivery. You can review contrast definitions, library strategy, imputation choices, and FDR thresholds within agreed scope. A dedicated scientific contact leads the project, coordinates milestone feedback, and documents decisions in the QC report and Methods draft.

What does post-delivery reviewer support include?

Support covers clarification of methods, QC thresholds, library construction, and minor figure or table revisions within agreed scope (typically ≤20% of deliverables), consistent with Pepkio's standard post-delivery policy. It does not include open-ended reanalysis or new biological contrasts. Substantial new work—additional contrasts, alternate imputation, or phosphoproteomics extensions—is scoped as separate milestones with updated deliverables and timeline.

Is co-authorship required?

No. Pepkio operates as a fee-for-service provider unless co-authorship is explicitly discussed and agreed in advance. You retain ownership of results and code delivered under the project agreement. Authorship is never a default requirement for computational analysis work on client-generated data, and billing is separate from publication credit.

Should I choose DDA or DIA for my cohort proteomics study?

DIA suits large cohorts requiring consistent quantification across many runs; DDA LFQ suits flexible discovery, phospho-enriched workflows, or when MaxQuant compatibility is required (Jiang et al., 2023; Yu et al., 2024). nDIA on Orbitrap Astral quantifies ~10,000 protein groups in a half-an-hour gradient for high-throughput designs (Mann et al., 2023). Pepkio documents trade-offs in the kickoff feasibility review.

Do I need a separate DDA experiment to build a spectral library for DIA?

Not always. DIA-NN library-free mode uses predicted spectra from the UniProt FASTA, avoiding a dedicated DDA run (Demichev et al., 2020; Jiang et al., 2023). Experimental DDA libraries can increase depth for complex proteomes or PTMs. Hybrid DDA+DIA library building via MSFragger-DIA in FragPipe is available when instrument time permits (Yu et al., 2023).

How do you handle TMT ratio compression and missing values in multiplex designs?

TMT reporter intensities are extracted with isotope impurity correction in MaxQuant. Ratio compression from co-isolation is documented in the QC report; SPS-MS3 or alternative quantification can be scoped when needed. Missing values are profiled for MNAR patterns; proDA or group-wise imputation replaces blind Perseus defaults when MNAR dominates (Lazar et al., 2016; Korkmaz et al., 2020).

Can Pepkio run custom or non-standard DDA/DIA analyses?

Yes, when a feasibility review confirms inputs, outputs, and timeline. Bespoke workflows—custom contrasts, phosphoproteomics localization, affinity proteomics, proteogenomic correlation, client-specified search parameters, or non-standard deliverables—are scoped at kickoff with documented milestones rather than assumed as part of the standard DDA/DIA pipeline. Feasibility covers data format, cohort size, and deliverable fit before work begins.

Related services

Single-cell proteomics — Ultra-low-input DIA when bulk averages mask cell-state heterogeneity.
Olink proximity extension — Targeted plasma or CSF validation of MS-discovered protein signatures.
Bulk RNA-seq — Proteogenomic correlation when matched RNA-seq and proteomics data share sample IDs.
Multi-omics integration — Cross-layer modeling when proteomics, transcriptomics, and metabolomics datasets align by sample.
Custom analysis — Non-standard phospho workflows, affinity proteomics, or client-specified toolchains beyond the standard DDA/DIA pipeline.

References

Yu F, Teo GC, Kong AT, et al. Acquisition and analysis of DIA-based proteomic data: a comprehensive survey in 2023. Molecular & Cellular Proteomics. 2024;23(2):100712. https://doi.org/10.1016/j.mcpro.2024.100712 (PMID: 38182042)
Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nature Biotechnology. 2008;26(12):1367–1372. https://doi.org/10.1038/nbt.1511 (PMID: 19029910)
Cox J, Hein MY, Luber CA, Paron I, Nagaraj N, Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Molecular & Cellular Proteomics. 2014;13(9):2513–2526. https://doi.org/10.1074/mcp.M113.031591 (PMID: 24942700)
Jiang L, Wang D, Wright C, et al. Benchmarking commonly used software suites and analysis workflows for DIA proteomics and phosphoproteomics. Nature Communications. 2023;14:94. https://doi.org/10.1038/s41467-022-35740-1 (PMID: 36609502)
Yu F, Haynes SE, Nesvizhskii AI. Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform. Nature Communications. 2023;14:4154. https://doi.org/10.1038/s41467-023-39869-5 (PMID: 37438352)
Demichev V, Messner CB, Vernardoeil S, Lilley KS, Ralser M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nature Methods. 2020;17(1):41–44. https://doi.org/10.1038/s41592-020-00998-0 (PMID: 31768060)
Choi M, Chang C-Y, Clough T, et al. MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics. 2014;30(17):2524–2526. https://doi.org/10.1093/bioinformatics/btu305 (PMID: 24794931)
Lazar C, Gatto L, Ferro M, Bruley C, Burger T. Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. Journal of Proteome Research. 2016;15(4):1116–1125. https://doi.org/10.1021/acs.jproteome.5b00981 (PMID: 26906401)
Korkmaz S, Cox J, Grosse I, et al. Accurate and robust Bayesian inference of proteome-wide differential expression. Nature Methods. 2020;17(12):1215–1221. https://doi.org/10.1038/s41592-020-00949-9
Tyanova S, Temu T, Sinitcyn P, et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nature Methods. 2016;13(9):731–740. https://doi.org/10.1038/nmeth.3901 (PMID: 27348712)
Clark DJ, Dhanasekaran SM, Petralia F, et al. Integrated proteogenomic characterization of clear cell renal cell carcinoma. Cell. 2019;179(4):964–983.e31. https://doi.org/10.1016/j.cell.2019.10.007 (PMID: 31675502)
Mann M, Kumar C, Zeng W-F, et al. Ultra-fast label-free quantification and comprehensive proteome coverage with narrow-window data-independent acquisition. Nature Biotechnology. 2023;41(9):1225–1230. https://doi.org/10.1038/s41587-023-02099-7 (PMID: 38302753)
Rosenberger G, Yu F, Teo GC, et al. Exploration of cell state heterogeneity using single-cell proteomics through sensitivity-tailored data-independent acquisition. Nature Communications. 2023;14:5910. https://doi.org/10.1038/s41467-023-41602-1 (PMID: 37737208)
Zhang Y, Wen B, Lin L, et al. Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference. Nature Communications. 2024;15:3579. https://doi.org/10.1038/s41467-024-47899-w (PMID: 38724498)
Biskup K, Čuklina J, Mehnert M, et al. Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial. Molecular Systems Biology. 2021;17(8):e10232. https://doi.org/10.15252/msb.202110232 (PMID: 34432947)
Kong AT, Leprevost FV, Avtonomov DM, Mellacheruvu D, Nesvizhskii AI. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics. Nature Methods. 2017;14(5):513–520. https://doi.org/10.1038/nmeth.4256 (PMID: 28394336)
da Veiga Leprevost F, Haynes SE, Avtonomov DM, et al. Philosopher: a versatile toolkit for shotgun proteomics data analysis. Nature Methods. 2020;17(9):869–870. https://doi.org/10.1038/s41596-020-0356-x (PMID: 32669682)
Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015;43(7):e47. https://doi.org/10.1093/nar/gkv007 (PMID: 25605792)
Cairns DA, Barrett JH, Billingham LJ, et al. Sample size determination in clinical proteomic profiling experiments using mass spectrometry for class comparison. Proteomics. 2009;9(1):74–86. https://doi.org/10.1002/pmic.200800417 (PMID: 19053145)

Let's Talk About Your Science

Tell us:

• Your biological question
• Data type and size
• Timeline constraints

We'll tell you:

• What's feasible
• How long it will take
• Exactly what it will cost

DDA/DIA Proteomics Analysis Service — Library-Based and Library-Free Quantification from Raw LC-MS/MS to MSstats Contrasts

Key facts

What is DDA/DIA proteomics?

When should you use DDA/DIA proteomics?

How the analysis works — step by step

1. Validate inputs and experimental design

2. Convert and inspect raw MS data

3. Build or select spectral library (DIA)

4. Identify peptides and infer proteins

5. Quantify protein abundances

6. Normalize and diagnose batch effects

7. Filter proteins and handle missing values

8. Test differential protein abundance

9. Run pathway enrichment and interpret results

10. Package figures, scripts, and Methods draft