Statistical Analysis

Mendelian Randomization Analysis Service — Two-Sample Causal Inference from GWAS Summary Statistics with STROBE-MR-Aligned Sensitivity Analyses

Mendelian randomization uses genetic variants as instrumental variables to estimate causal exposure effects (Burgess et al., 2023). Pepkio delivers version-pinned two-sample MR from GWAS summary statistics—harmonized tables, IVW and sensitivity analyses, plots, R scripts, and a Methods draft—for academic, biotech, and pharma teams; custom extensions scoped at kickoff. EJE MR submissions rose from 3.1% to 13.0% between 2020 and 2024 (Hemani et al., 2025).

Key facts

Key facts about Mendelian Randomization
FactValue
Supported platforms / instrumentsOpenGWAS database IDs; client-supplied GWAS summary statistics (.tsv, .txt); UK Biobank, FinnGen, and GWAS Catalog exports when documented; one-sample MR from individual-level genotype and phenotype data when scoped separately
Input requirementsPer-SNP beta, standard error, effect allele, other allele, and sample size; several independent instruments after LD clumping for two-sample MR (Burgess et al., 2023); exposure and outcome GWAS ideally from non-overlapping cohorts or with overlap fraction documented (Burgess et al., 2016)
Reference builds supportedHuman GRCh38 / hg38 (primary); GRCh37 / hg19 with documented liftover when scoped; LD clumping against 1000 Genomes Phase 3 EUR by default
Primary tools (with versions)TwoSampleMR 0.7.4; ieugwasr 1.1.0; MendelianRandomization 0.10.0; MR-PRESSO 1.0 — pinned per project
Typical turnaround time2–4 weeks (standard two-sample MR); 3–5 weeks (multivariable MR, multiple exposure–outcome pairs, or scoped extensions) — confirmed at kickoff
Deliverable formatsHarmonized GWAS .tsv; MR result tables (.csv, .xlsx); scatter, funnel, and leave-one-out plots (PDF/SVG); STROBE-MR mapping document; commented R scripts with renv.lock; Methods draft
Key cited best-practice referenceBurgess et al. (2023), Wellcome Open Research; Skrivankova et al. (2021), STROBE-MR, BMJ
Custom / bespoke analysisMR-RAPS, multivariable or mediation MR, sample-overlap correction, colocalization extensions, client-specified clumping thresholds, pre-specified phenome-wide scans, and non-standard output formats scoped at kickoff

What is Mendelian randomization?

Mendelian randomization treats germline genetic variants as instrumental variables to estimate causal exposure effects on outcomes from summary-level or individual-level association data (Sanderson et al., 2022). Pepkio's default workflow is two-sample MR: SNP–exposure associations come from one GWAS and SNP–outcome associations from a separate study, then IVW combines per-SNP Wald ratios (Burgess et al., 2013). Unlike observational regression, MR exploits quasi-random allele allocation at conception to reduce confounding and reverse causation—though IV assumptions must still be assessed (Burgess et al., 2023). Custom entry points and estimators are agreed at kickoff. See the Mendelian randomization glossary.

When should you use Mendelian randomization?

Two-sample MR fits when you have genome-wide association summary data for a modifiable exposure and a disease or trait outcome, and need a causal estimate that observational studies cannot support alone (Burgess et al., 2023).

Comparison of two-sample Mendelian randomization, multivariable observational epidemiology, and randomized controlled trials
ApproachBest forLimitationsApproximate cost range
Two-sample Mendelian randomizationCausal effect of a modifiable exposure on an outcome using existing GWAS summary dataRequires valid IV assumptions; sensitive to horizontal pleiotropy, weak instruments, and sample overlapQuote-based bioinformatics; no new wet-lab data required when public GWAS suffice
Multivariable observational epidemiologyRich covariate adjustment when randomized trials are infeasibleResidual confounding and reverse causation limit causal interpretationLower per-analysis cost than new cohorts; higher risk of biased causal claims
Randomized controlled trialDefinitive causal inference for interventionsCost, duration, and ethical constraints for many exposuresHighest per-study cost; reference standard when feasible
  • LDL cholesterol and coronary heart disease: Holmes et al. (2015) found LDL-C genetic instruments associated with CHD risk in 17 studies (62,199 participants; 12,099 CHD events).
  • BMI across the phenome: Millard et al. (2019) identified 587 trait associations at 5% FDR in 334,968 UK Biobank participants—including adverse effects on diabetes and hypertension.
  • Apolipoprotein B versus LDL cholesterol for CHD: Richardson et al. (2020) found apolipoprotein B retained a causal CHD association (OR 1.92 per 1-SD increase) in multivariable MR with up to 60,801 CHD cases.

How the analysis works — step by step

  1. 1. Scope the exposure–outcome pair and estimand

    Pepkio confirms biological plausibility of a causal pathway, matched population ancestry between GWAS sources, and pre-specified primary and sensitivity estimators before data extraction (Burgess et al., 2023; Skrivankova et al., 2021). Sample overlap between exposure and outcome cohorts is documented because partial overlap biases estimates toward the observational association (Burgess et al., 2016).

    Tools and outputs

    Tools used: Scope template; literature review notes

    Output: Signed scope document with exposure, outcome, estimand, and planned sensitivity methods

  2. 2. Inventory GWAS inputs and metadata

    OpenGWAS IDs, client summary-statistics files, genome build, ancestry descriptor, trait units, and case–control versus continuous outcome type are recorded. Missing required columns (beta, se, effect allele, other allele, sample size) are flagged before instrument selection.

    Tools and outputs

    Tools used: Custom validation scripts; ieugwasr 1.1.0 gwasinfo() when OpenGWAS IDs are used

    Output: input_manifest.csv with GWAS accessions, build, ancestry, sample size, and overlap notes

  3. 3. Extract and clump exposure instruments

    Genome-wide significant SNPs (p < 5×10⁻⁸) associated with the exposure are extracted and LD-clumped (r² < 0.001 within a 10,000 kb window) against a European reference panel to obtain independent instruments (Hemani et al., 2018). Clumping parameters and the LD reference population are locked at kickoff.

    Tools and outputs

    Tools used: TwoSampleMR 0.7.4; ieugwasr 1.1.0

    Output: exposure_instruments.tsv with SNP, beta, SE, effect allele, other allele, p-value, and clumping metadata

  4. 4. Compute instrument strength

    Per-SNP and aggregate F-statistics quantify instrument–exposure association strength. F-statistics are reported as weak-instrument diagnostics—not used to post-hoc drop SNPs, which can introduce winner's-curse bias (Pierce et al., 2011; Sanderson et al., 2022).

    Tools and outputs

    Tools used: TwoSampleMR 0.7.4

    Output: instrument_strength.csv with per-SNP F-statistics, exposure R², and aggregate F

  5. 5. Extract outcome associations for instruments

    The same instrument SNPs are queried in the outcome GWAS. SNPs absent from the outcome dataset are logged; LD-proxy lookup via OpenGWAS is attempted when appropriate and documented (Hemani et al., 2018).

    Tools and outputs

    Tools used: TwoSampleMR 0.7.4; ieugwasr 1.1.0

    Output: outcome_snps.tsv with SNP-level outcome beta, SE, and availability flags

  6. 6. Harmonize effect alleles

    Effect alleles are aligned so exposure and outcome betas refer to the same allele. Palindromic SNPs (A/T or C/G) are resolved using allele frequency checks or removed when strand cannot be determined; every decision is logged (Burgess et al., 2023).

    Tools and outputs

    Tools used: TwoSampleMR 0.7.4

    Output: harmonized_dat.tsv; harmonization_log.csv listing flipped, removed, and ambiguous SNPs

  7. 7. Run primary Mendelian randomization

    The primary analysis uses the IVW multi-instrument estimator under an assumption of balanced pleiotropy across instruments (Burgess et al., 2013). Effect estimates are reported on the exposure scale documented in the scope (e.g., log-odds per 1-SD exposure increase).

    Tools and outputs

    Tools used: MendelianRandomization 0.10.0 (Yavorska & Burgess, 2017); TwoSampleMR 0.7.4

    Output: mr_primary_results.csv with columns method, exposure, outcome, nsnp, beta, se, pval

  8. 8. Run sensitivity and robustness analyses

    Weighted median, MR-Egger, and leave-one-out analyses test stability to horizontal pleiotropy and outlier SNPs (Bowden et al., 2016; Burgess & Thompson, 2017). MR-PRESSO global and outlier tests detect pleiotropic distortion (Verbanck et al., 2018). Cochran's Q quantifies heterogeneity across SNP-specific estimates. RadialMR outlier tests are included when scoped.

    Tools and outputs

    Tools used: MendelianRandomization 0.10.0; MR-PRESSO 1.0; TwoSampleMR 0.7.4; RadialMR when scoped

    Output: mr_sensitivity_results.csv; pleiotropy_diagnostics.csv

  9. 9. Generate diagnostic figures

    Scatter plots display SNP-specific Wald ratios against instrument strength; funnel plots and leave-one-out influence plots show whether single SNPs drive the aggregate estimate (Burgess et al., 2023). Per-SNP forest plots are included when scoped.

    Tools and outputs

    Tools used: TwoSampleMR 0.7.4; ggplot2 via MendelianRandomization 0.10.0

    Output: MR scatter plot, funnel plot, leave-one-out plot, and optional forest plot (PDF/SVG)

  10. 10. Map results to STROBE-MR and package deliverables

    Results are mapped to STROBE-MR checklist items covering instrument selection, assumption assessment, and sensitivity reporting (Skrivankova et al., 2021). An interpretation memo states where exclusion restriction or weak-instrument limitations may apply. Commented R scripts, lock files, README, and a Methods draft cite exact GWAS accessions and software versions.

    Tools and outputs

    Tools used: Custom STROBE-MR mapping template

    Output: strobe_mr_mapping.csv; interpretation memo; final deliverable bundle; Methods draft

What Pepkio delivers

Processed data files

  • harmonized_dat.tsv; exposure_instruments.tsv; instrument_strength.csv
  • mr_primary_results.csv; mr_sensitivity_results.csv; harmonization_log.csv; pleiotropy_diagnostics.csv

Figures (PDF/SVG)

  • MR scatter plot; funnel plot; leave-one-out influence plot
  • Per-SNP forest plot when scoped

Tables

  • Primary MR results (method, exposure, outcome, nsnp, beta, se, pval)
  • Instrument table (SNP, beta_exposure, se_exposure, beta_outcome, se_outcome, eaf, F_stat)
  • Sensitivity results (method, beta, se, pval, pleiotropy_test, heterogeneity_Q, heterogeneity_p)
  • STROBE-MR checklist mapping

Code

  • Commented R scripts per analysis stage; renv.lock listing exact package versions
  • Delivery via private Git repository or agreed secure file transfer — you retain full ownership

Documentation

  • Harmonization and instrument-selection log; README with rerun instructions
  • Journal-formatted Methods draft citing software versions and GWAS accession IDs
  • STROBE-MR mapping document

Custom and post-delivery support

  • Non-standard estimators (MR-RAPS, multivariable MR), client-specified table or figure formats, and additional exposure–outcome pairs defined at kickoff
  • Methods clarification, assumption interpretation, and minor figure or table revisions within agreed scope (typically ≤20% of project scope)

Technical decisions we make — and why

Primary estimator: IVW with weighted median as robust companion
Pepkio reports IVW as the primary multi-instrument estimator when instruments are independent after clumping (Burgess et al., 2013). Weighted median MR is run in parallel because it remains consistent when up to 50% of the information comes from invalid instruments under pleiotropy (Bowden et al., 2016). MR-Egger is interpreted cautiously when the Egger intercept indicates directional pleiotropy (Burgess & Thompson, 2017).
LD clumping: r² < 0.001, 10,000 kb window, 1000 Genomes Phase 3 EUR
These parameters match TwoSampleMR defaults for independent instrument selection in European-ancestry GWAS (Hemani et al., 2018). Population-matched LD panels are substituted when ancestry or scope requires it.
Instrument inclusion: report F-statistics, do not auto-exclude weak instruments
F-statistics are reported as weak-instrument diagnostics; using F as a post-hoc inclusion filter can worsen bias through winner's curse (Sanderson et al., 2022). Pepkio documents weak-instrument limitations in the interpretation memo rather than silently dropping SNPs.
Sample overlap: prefer non-overlapping GWAS; quantify partial overlap
With 50% participant overlap and F = 10, relative bias in a two-sample setting is approximately 5% of the one-sample bias under a null causal effect (Burgess et al., 2016). Pepkio documents overlap fractions from GWAS metadata and flags pairs where overlap may inflate Type I error; overlap-correction methods are scoped when non-overlapping sources are unavailable.
Palindromic SNPs: resolve strand with allele frequency or exclude
Allele harmonization errors from strand mismatches are a common source of spurious MR signals (Burgess et al., 2023). Palindromic SNPs with ambiguous strand are removed or inferred using allele frequency concordance; every removed SNP is listed in harmonization_log.csv.

Common questions

What GWAS data do I need to provide for a Mendelian randomization project?

You need exposure and outcome GWAS summary statistics with per-SNP beta, standard error, effect allele, other allele, and sample size—or OpenGWAS database IDs we can query on your behalf. After LD clumping, most two-sample MR analyses use several independent instruments, though count depends on trait polygenicity and outcome GWAS sample size (Burgess et al., 2023; Pierce et al., 2011). Population ancestry should match between exposure and outcome sources. Custom column formats are accepted when documented at kickoff.

Can Pepkio work with incomplete or poor-quality GWAS summary statistics?

Yes, within documented limits. Missing SNPs in the outcome GWAS are logged and LD-proxy lookup is attempted where appropriate (Hemani et al., 2018). Files lacking required columns, mixed genome builds without liftover, or summary statistics with undocumented allele coding require preprocessing scoped at kickoff. If too few instruments survive harmonization for stable inference, Pepkio reports this before finalizing interpretation.

Which GWAS databases and summary-statistics sources do you support?

Pepkio routinely works with OpenGWAS database IDs, UK Biobank summary-data exports, FinnGen release files, GWAS Catalog full summary statistics, and client in-house .tsv or .txt files. OpenGWAS authentication tokens are configured when protected endpoints are required (ieugwasr 1.1.0). Non-GRCh38 builds and non-European ancestry cohorts are supported when LD reference panels and harmonization rules are agreed at kickoff.

How long does a Mendelian randomization analysis take at Pepkio?

Standard two-sample MR for one pre-specified exposure–outcome pair typically completes in 2–4 weeks from data receipt. Projects involving multivariable MR, multiple exposure–outcome pairs, or scoped extensions may take 3–5 weeks. Exact timelines are confirmed at kickoff with milestone check-ins when scoped.

How do you handle population stratification and sample overlap between GWAS?

MR does not use batch correction in the sequencing sense; instead, Pepkio matches ancestry between exposure and outcome GWAS and documents sample overlap from study metadata (Burgess et al., 2016). Partial overlap between cohorts biases estimates toward the confounded observational association; Pepkio flags overlap and discusses overlap-correction approaches when non-overlapping GWAS are unavailable. Mismatched ancestry without appropriate LD panels is flagged at kickoff.

Do I receive the analysis code—and do I own it?

Yes—you retain full ownership of all scripts and results. Pepkio delivers commented R scripts with renv.lock files listing exact package versions, organized by pipeline stage with a README. Delivery is via private Git repository or agreed secure file transfer.

Can I be involved during the Mendelian randomization analysis?

Yes, when scoped at kickoff. Checkpoint reviews can occur after instrument selection, after harmonization, and before final delivery. You can review exposure–outcome pairs, clumping parameters, sensitivity methods, and interpretation within agreed scope. A dedicated PhD-level scientific contact leads the project.

What does post-delivery reviewer support cover?

Support includes clarification of MR methods, harmonization decisions, pleiotropy diagnostics, and minor figure or table revisions within agreed scope (typically ≤20% of project scope). Pepkio drafts Methods and Supplementary text for analyses we performed. New exposure–outcome pairs or sensitivity methods requested by reviewers are scoped as separate milestones.

Is co-authorship required when working with Pepkio?

No. Pepkio operates as a fee-for-service provider and does not require co-authorship unless explicitly discussed in advance. Acknowledgment of bioinformatics support in the Acknowledgments section is standard practice.

How does Pepkio handle horizontal pleiotropy in Mendelian randomization?

Horizontal pleiotropy violates the exclusion restriction when instruments affect the outcome through pathways other than the exposure (Verbanck et al., 2018). Pepkio runs weighted median MR, MR-Egger, MR-PRESSO, and leave-one-out analyses alongside IVW. When estimates diverge across methods, the interpretation memo treats causal claims cautiously and documents outlier SNPs.

Can Pepkio run multivariable or mediation Mendelian randomization?

Yes, when scoped at kickoff. Multivariable MR for correlated exposures and two-step mediation MR are supported using TwoSampleMR 0.7.4 and MendelianRandomization 0.10.0, with sensitivity analyses per Burgess et al. (2023). Network MR, MR-RAPS, and complex overlap corrections require explicit scoping.

What happens if my exposure and outcome GWAS share the same participants?

Partial overlap biases two-sample MR toward the observational association; with 50% overlap, relative bias is approximately half the one-sample bias under F = 10 and a null causal effect (Burgess et al., 2016). Pepkio documents overlap from metadata, prefers non-overlapping sources when available, and scopes overlap-adjustment methods otherwise. Fully overlapping one-sample MR is scoped separately.

Related services

  • Experimental designProspective power and sample-size planning before collecting omics data that may support downstream colocalization with MR loci when scoped.
  • Variant callingVariant-level annotation and LD context when building custom instrument sets from in-house genotype data.
  • Bulk RNA-seqExpression quantification for colocalization analyses at GWAS loci when scoped alongside MR.
  • Bioinformatics consultingFeasibility assessment, exposure–outcome pair selection, and GWAS source identification before committing to an MR project.
  • Custom analysisNon-standard MR extensions, multi-trait integration, or bespoke reporting beyond the standard two-sample workflow.
References
  1. Burgess S, Davey Smith G, Davies NM, et al. Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome Open Research. 2023;4:186. https://doi.org/10.12688/wellcomeopenres.15555.3 (PMID: 32760811)
  2. Skrivankova VW, Richmond RC, Woolf BAR, et al. Strengthening the reporting of observational studies in epidemiology using Mendelian randomisation (STROBE-MR): explanation and elaboration. BMJ. 2021;375:n2233. https://doi.org/10.1136/bmj.n2233 (PMID: 34702754)
  3. Hemani G, Zheng J, Elsworth B, et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife. 2018;7:e34408. https://doi.org/10.7554/eLife.34408 (PMID: 29846171)
  4. Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genetic Epidemiology. 2013;37(7):658–665. https://doi.org/10.1002/gepi.21758 (PMID: 24114802)
  5. Burgess S, Thompson SG. Interpreting findings from Mendelian randomization using the MR-Egger method. European Journal of Epidemiology. 2017;32(5):377–389. https://doi.org/10.1007/s10654-017-0255-x (PMID: 28527048)
  6. Verbanck M, Chen C-Y, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nature Genetics. 2018;50(5):693–698. https://doi.org/10.1038/s41588-018-0304-7 (PMID: 29686387)
  7. Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genetic Epidemiology. 2016;40(4):304–314. https://doi.org/10.1002/gepi.21965 (PMID: 27061298)
  8. Burgess S, Davies NM, Thompson SG. Bias due to participant overlap in two-sample Mendelian randomization. Genetic Epidemiology. 2016;40(7):597–608. https://doi.org/10.1002/gepi.21998 (PMID: 27625185)
  9. Pierce BL, Ahsan H, VanderWeele TJ. Power and instrument strength requirements for Mendelian randomization studies using multiple genetic variants. International Journal of Epidemiology. 2011;40(3):740–752. https://doi.org/10.1093/ije/dyr051 (PMID: 20813862)
  10. Sanderson E, Glymour MM, Holmes MV, et al. Mendelian randomization. Nature Reviews Methods Primers. 2022;2:6. https://doi.org/10.1038/s43586-022-00084-9
  11. Yavorska OO, Burgess S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. International Journal of Epidemiology. 2017;46(6):1734–1739. https://doi.org/10.1093/ije/dyx034 (PMID: 28398548)
  12. Hemani G, Stender S, Wolters FJ, et al. The rapid growth in Mendelian randomization studies. European Journal of Epidemiology. 2025;40(10):1165–1171. https://doi.org/10.1007/s10654-025-01317-7 (PMID: 41196509)
  13. Holmes MV, Asselbergs FW, Palmer TM, et al. Mendelian randomization of blood lipids for coronary heart disease. European Heart Journal. 2015;36(9):539–550. https://doi.org/10.1093/eurheartj/eht571 (PMID: 24474739)
  14. Millard LAC, Davies NM, Tilling K, Gaunt TR, Davey Smith G. Searching for the causal effects of body mass index in over 300,000 participants in UK Biobank, using Mendelian randomization. PLoS Genetics. 2019;15(2):e1007951. https://doi.org/10.1371/journal.pgen.1007951 (PMID: 30707692)
  15. Richardson TG, Sanderson E, Palmer TM, et al. Evaluating the relationship between circulating lipoprotein lipids and apolipoproteins with risk of coronary heart disease: a multivariable Mendelian randomisation analysis. PLoS Medicine. 2020;17(3):e1003062. https://doi.org/10.1371/journal.pmed.1003062 (PMID: 32203549)

Let's Talk About Your Science

Tell us:

  • • Your biological question
  • • Data type and size
  • • Timeline constraints

We'll tell you:

  • • What's feasible
  • • How long it will take
  • • Exactly what it will cost
Contact Us

Contact us to start with a free consultation. Need everyday bench calculators? Try our free lab tools.