Bioinformatics analysis service

Transcriptomics Analysis Services — Version-Pinned Workflows Across Bulk, Single-Cell, Spatial, and Long-Read Data

Transcriptomics measures genome-wide RNA abundance to profile which genes and transcripts are active under specific conditions. Pepkio's transcriptomics analysis service delivers version-pinned pipelines from raw FASTQs or vendor outputs to annotated results, with full code ownership, publication figures, and a Methods draft. We serve academic, biotech, and pharma teams—and fully support custom inputs, outputs, and non-standard analyses scoped at kickoff.

Key facts

Key facts about transcriptomics analysis
Fact	Value
Data types supported	Bulk RNA-seq (Illumina FASTQs or count matrices); 10x Chromium scRNA-seq; 10x Visium/Xenium spatial transcriptomics; PacBio HiFi and Oxford Nanopore long-read RNA-seq
Reference builds or standards used	Human GRCh38 (GENCODE v44 / Ensembl 110); mouse GRCm39 (GENCODE vM33 / Ensembl 110); ENCODE RNA-seq standards for bulk depth and replicate guidance (ENCODE Consortium, 2016)
Primary tools (with versions)	STAR 2.7.11b; DESeq2 1.52.0; Cell Ranger 10.0.0; Scanpy 1.12.1; Seurat 5.2.1; Space Ranger 3.1.3; Squidpy 1.8.1; IsoQuant 3.13.0; SQANTI3 6.0.1; clusterProfiler 4.20.0
Typical turnaround range	2–4 weeks (bulk); 3–5 weeks (scRNA-seq); 4–6 weeks (spatial or long-read); 5–10 weeks (multi-cohort integration or HD spatial)—confirmed at kickoff
Deliverable formats	.csv/.tsv count matrices; .rds/h5ad objects; .gtf isoform catalogs; PDF/SVG figures; HTML QC reports; commented R/Python scripts; Methods draft
Regulatory/reproducibility standards followed	ENCODE RNA-seq guidelines; MINSEQE-aligned sample metadata and reporting; version-pinned software with sessionInfo() or conda lock files; private Git or Zenodo archival on request
Custom / bespoke analysis	Non-standard inputs (pre-aligned BAM, vendor matrices), output formats, methods, and analyses beyond standard workflows—e.g., cross-modality integration, fusion calling, trajectory inference, or client-specified contrasts—scoped at kickoff

What is transcriptomics?

Transcriptomics is the genome-wide measurement of RNA—messenger RNA (mRNA), long non-coding RNA, and other transcript classes—to quantify which genes are expressed, at what level, and in which cells or tissue regions. Unlike targeted assays such as qPCR, RNA sequencing (RNA-seq) captures novel splice junctions and unannotated transcripts when depth permits (Conesa et al., 2016). The core question is which transcripts differ between conditions, cell types, or spatial domains, and what pathways they implicate. Adoption is substantial: GTEx v8 profiled 17,382 RNA-seq samples across 52 tissues and two cell lines from 948 donors (GTEx Consortium, 2020).

What transcriptomics analysis can answer

Published examples of biological questions transcriptomics can address:

Which genes differ between disease and control tissue? Sun et al. (2019) analyzed TCGA bulk RNA-seq from 103 primary and 368 metastatic melanoma samples, identifying 246 differentially expressed lncRNAs and 856 mRNAs between stages.
Which T cell populations expand during anti-PD-1 therapy in lung cancer? Liu et al. (2022) profiled 47 NSCLC biopsies from 36 patients before and after PD-1 therapy, showing precursor exhausted T cells accumulate in responders through clonal revival rather than reinvigoration of terminally exhausted cells.
How is gene expression organized across tissue architecture? Arora et al. (2023) applied spatial transcriptomics to oral squamous cell carcinoma, identifying conserved tumor core and leading-edge domains linked to survival and therapy response.
Which full-length isoforms switch between conditions? Amarasinghe et al. (2020) reviewed long-read RNA-seq applications showing that full-length reads resolve isoform structures that short-read fragments cannot reconstruct reliably.
Which genetic variants alter tissue-specific expression? GTEx v8 linked cis-expression quantitative trait loci (eQTLs) across 52 tissues and two cell lines, enabling tissue-context interpretation of regulatory variants (GTEx Consortium, 2020).

Services included in this category

Pepkio's transcriptomics category covers bulk RNA-seq, single-cell RNA-seq, spatial transcriptomics, and long-read RNA-seq—each with a dedicated spoke page for inputs, tools, and deliverables.

Transcriptomics services offered by Pepkio
Service	Description	Primary tools
Bulk RNA-seq	Pooled-tissue differential expression from FASTQs or count matrices to DESeq2 contrasts and pathway enrichment	STAR 2.7.11b, featureCounts 2.0.8, DESeq2 1.52.0
Single-Cell RNA-seq	Per-cell clustering, annotation, and differential expression from 10x Chromium FASTQs to annotated UMAPs	Cell Ranger 10.0.0, Scanpy 1.12.1, Seurat 5.2.1
Spatial Transcriptomics	Tissue-coordinate expression, spatial domains, and cell-type deconvolution from Visium or Xenium data	Space Ranger 3.1.3, Squidpy 1.8.1, cell2location 0.1.5
Long-Read RNA-seq	Full-length isoform catalogs and isoform-level differential expression from PacBio HiFi or ONT reads	minimap2 2.31, IsoQuant 3.13.0, SQANTI3 6.0.1

What Pepkio delivers

Pepkio returns reproducible, analysis-ready outputs—not summary slides alone:

Processed data

Filtered count matrices; DESeq2 or annotated .rds/h5ad objects; SQANTI3-filtered GTF catalogs; sorted BAM on request

Figures and tables

MultiQC, PCA, volcano/MA, UMAP, and spatial maps (PDF/SVG); sample_manifest.csv; DEG and enrichment tables with log2FoldChange, padj, and gene descriptions

Code and documentation

Commented R/Python scripts; environment lock files; HTML QC report; README; Methods draft citing exact versions—you retain full ownership

Support

Milestone check-ins; reviewer clarification and minor revisions within agreed scope (typically ≤20% of deliverables)

Non-standard contrasts, cross-modality integration, or client-specified figures are scoped at kickoff.

How the analysis works — step by step

1. Scope study design and select modality
Confirm biological question, replicates, platform, and modality; flag confounded batch or modality mismatch before analysis (Conesa et al., 2016; Luecken & Theis, 2019).
Tools and outputs
Output: signed scope with contrasts
2. Validate inputs and record metadata
Verify FASTQ integrity, chemistry, and sample IDs; check vendor outputs against annotation builds.
Tools and outputs
Tools used: FastQC 0.12.1
Output: sample_manifest.csv
3. Run quality control and preprocessing
Assess adapter contamination and library quality per sample; aggregate metrics for review (Ewels et al., 2016).
Tools and outputs
Tools used: fastp 0.24.0; MultiQC 1.25.1; NanoPlot 1.42.0 (long-read)
Output: multiqc_report.html
4. Quantify expression by modality
Bulk: STAR 2.7.11b + featureCounts 2.0.8. scRNA-seq: Cell Ranger 10.0.0. Spatial: Space Ranger 3.1.3. Long-read: minimap2 2.31 + IsoQuant 3.13.0 (Dobin et al., 2013; Liao et al., 2014).
Tools and outputs
Output: count matrices and alignment summaries
5. Audit sample and batch integrity
Compute library sizes, detection rates, and read-distribution metrics; PCA separates condition from batch when possible (Love et al., 2014; Wang et al., 2024).
Tools and outputs
Tools used: DESeq2 1.52.0; RSeQC 5.0.3 (Wang et al., 2012)
Output: sample_qc_summary.csv; PCA plots
6. Model differential expression or cell states
Bulk: DESeq2 Wald tests with batch covariates (Love et al., 2014). scRNA-seq: clustering and cluster-wise DE. Spatial: Squidpy domains and pseudobulk statistics. Long-read: SQANTI3 filtering plus DESeq2 or DRIMSeq when replicates support isoform testing.
Tools and outputs
Output: deg_results.csv; annotated objects
7. Run functional and cell-type interpretation
GO/KEGG enrichment; cell-type annotation; cell2location deconvolution when a reference atlas exists (Yu et al., 2012; Moses & Pachter, 2022).
Tools and outputs
Tools used: clusterProfiler 4.20.0; SingleR 2.12.0
Output: enrichment and annotation tables
8. Generate publication figures and package deliverables
Export figures at publication resolution; package scripts, environment locks, README, and Methods draft.
Tools and outputs
Output: PDF/SVG figures; Git repository or file bundle
9. Deliver results and support reviewer requests
Transfer deliverables via agreed secure channels. Post-delivery support covers methods clarification and minor revisions; substantial new analyses are scoped separately.

Tools and standards we use

Pepkio pins software versions at kickoff and cites primary references in the Methods draft. Representative tools across modalities:

Transcriptomics tools and standards
Tool	Version	Role	Primary citation
STAR	2.7.11b	Splice-aware bulk RNA-seq alignment	https://doi.org/10.1093/bioinformatics/bts635
Subread/featureCounts	2.0.8	Gene-level read counting	https://doi.org/10.1093/bioinformatics/btt656
DESeq2	1.52.0	Bulk and isoform-level differential expression	https://doi.org/10.1186/s13059-014-0550-8
Cell Ranger	10.0.0	10x scRNA-seq alignment and UMI counting	10x Genomics (2024)
Scanpy	1.12.1	scRNA-seq and spatial object processing (Python)	https://doi.org/10.1186/s13059-017-1382-0
Seurat	5.2.1	scRNA-seq and spatial object processing (R)	https://doi.org/10.1038/s41587-023-01767-y
Space Ranger	3.1.3	10x Visium alignment and spot quantification	10x Genomics (2024)
Squidpy	1.8.1	Spatial statistics, neighborhood graphs, SVG detection	https://doi.org/10.1038/s41592-021-01358-2
minimap2	2.31	Long-read RNA-seq alignment	https://doi.org/10.1093/bioinformatics/bty191
IsoQuant	3.13.0	Long-read transcript discovery and quantification	https://doi.org/10.1038/s41587-022-01565-y
SQANTI3	6.0.1	Long-read isoform classification and filtering	https://doi.org/10.1038/s41592-024-02229-2
clusterProfiler	4.20.0	GO/KEGG enrichment	https://doi.org/10.1089/omi.2011.0118
MultiQC	1.25.1	Aggregated QC reporting	https://doi.org/10.1093/bioinformatics/btw354

Reference builds follow GENCODE v44 (human) and GENCODE vM33 (mouse) unless a project requires a custom annotation. Bulk depth follows ENCODE: ≥30 million aligned reads per replicate (ENCODE Consortium, 2016). Inferential designs typically require ≥3 biological replicates per condition where feasible (Conesa et al., 2016).

Common challenges — and how we handle them

Researchers often struggle with pipeline choice, batch confounding, incomplete reporting, QC imbalances, and modality mismatch. Pepkio addresses each with version-pinned workflows and checkpoint reviews.

Pipeline and tool choice overload.: 192 pipeline combinations on 18 samples yielded materially different quantification and DE results (Corchete et al., 2020). Pepkio selects version-pinned tool chains at kickoff and documents choices in the Methods draft.
Batch effects and confounded processing.: Technical variation can mask biological signal (Luecken & Theis, 2019; Deshpande et al., 2023). Pepkio includes batch in design formulas where not confounded with condition and reviews PCA before modeling.
Insufficient methodological reporting.: Only 25% of RNA-seq articles describe all essential computational steps (Simoneau & Scott, 2021). Pepkio delivers scripts, parameter logs, and a Methods draft with exact versions.
Hidden quality imbalances between groups.: 35% of 40 clinical RNA-seq datasets had significant quality imbalances between groups (Sprang et al., 2024). Pepkio audits per-sample QC and flags asymmetry before testing.
Modality mismatch.: Bulk RNA-seq averages across cell types; heterogeneous tissues may need scRNA-seq or spatial resolution (Conesa et al., 2016; Moses & Pachter, 2022). Pepkio advises on modality at kickoff.

Common questions

What data do I need to provide for a transcriptomics analysis project?

Provide FASTQs, vendor outputs (Cell Ranger, Space Ranger), or count matrices plus metadata listing condition, batch, and covariates. Pepkio confirms chemistry and reference build at kickoff. Bulk RNA-seq: ENCODE recommends ≥30 million aligned reads per replicate (ENCODE Consortium, 2016). Custom formats are accepted when scoped in advance.

How long does transcriptomics analysis take at Pepkio?

Bulk 2–4 weeks; scRNA-seq 3–5 weeks; spatial or long-read 4–6 weeks; multi-cohort or HD spatial 5–10 weeks. Exact timelines are confirmed at kickoff.

What do Pepkio transcriptomics deliverables look like?

Count matrices or annotated objects, DEG tables, PDF/SVG figures, HTML QC report, commented scripts, and a Methods draft citing tool versions.

Can Pepkio handle my specific sequencing platform or instrument?

Yes for Illumina bulk, 10x Chromium, Visium/Xenium, and PacBio or ONT long-read data. Visium HD, BD Rhapsody, Parse, CosMx, and pre-built matrices are supported when scoped at kickoff.

What if my RNA-seq data quality is poor?

Sub-threshold libraries are analyzed with limitations documented in the QC report (ENCODE Consortium, 2016). Outliers are flagged before modeling; re-sequencing is discussed when yield threatens the study question. Condition-group quality imbalances are reported explicitly (Sprang et al., 2024).

Do I receive the analysis code—and do I own it?

Yes—you retain full ownership. Pepkio delivers commented R/Python scripts with environment lock files via private Git or agreed file transfer.

Can I be involved during the transcriptomics analysis?

Yes. Checkpoint reviews follow QC, sample audit, and before final delivery. You can review contrasts, covariates, and filtering thresholds within agreed scope.

What happens if a journal reviewer requests changes after delivery?

Methods clarification and minor revisions within agreed scope (typically ≤20% of deliverables) are covered. Substantial new analyses are scoped and priced separately.

How do I choose between bulk, single-cell, spatial, and long-read RNA-seq?

Bulk fits stable tissue composition; scRNA-seq resolves heterogeneity; spatial preserves architecture; long-read resolves isoforms (Conesa et al., 2016; Luecken & Theis, 2019; Moses & Pachter, 2022; Amarasinghe et al., 2020). Pepkio advises at kickoff.

Can Pepkio run custom or non-standard transcriptomics analyses?

Yes—when scoped at kickoff: custom inputs, fusion calling, trajectory inference, cross-modality integration, alternative DE engines (edgeR, limma-voom), or client-specified outputs.

Related services

Genomics & variant analysis — Integrate expression with variant calls, eQTL mapping, or allele-specific expression when genotype data are available.
Proteomics — Validate transcript-level findings at the protein level in multi-omics studies.
Machine learning — Build predictive models from expression features for biomarker discovery or patient stratification.
Statistical analysis — Experimental design, power estimation, and cohort modeling before library prep.
Bioinformatics consulting — Modality and depth selection, feasibility assessment, and pipeline planning before committing to a full analysis project.

References

Conesa A, Madrigal P, Tarazona S, et al. A survey of best practices for RNA-seq data analysis. Genome Biology. 2016;17(1):13. https://doi.org/10.1186/s13059-016-0881-8 (PMID: 26813401)
Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Molecular Systems Biology. 2019;15(6):e8746. https://doi.org/10.15252/msb.20188746 (PMID: 31217225)
Moses L, Pachter L. Museum of spatial transcriptomics. Nature Methods. 2022;19(5):534–546. https://doi.org/10.1038/s41592-022-01409-2 (PMID: 35273392)
Amarasinghe SL, Su S, Dong X, et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biology. 2020;21(1):30. https://doi.org/10.1186/s13059-020-1935-5 (PMID: 32033565)
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014;15(12):550. https://doi.org/10.1186/s13059-014-0550-8 (PMID: 25516281)
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369(6509):1318–1330. https://doi.org/10.1126/science.aaz1776 (PMID: 32913098)
ENCODE Consortium. ENCODE Guidelines and Best Practices for RNA-Seq (Revised December 2016). https://www.encodeproject.org/documents/cede0cbe-d324-4ce7-ace4-f0c3eddf5972/@@download/attachment/ENCODE%20Best%20Practices%20for%20RNA_v2.pdf
Wang D, Liu Y, Zhang Y, et al. A real-world multi-center RNA-seq benchmarking study using the Quartet and MAQC reference materials. Nature Communications. 2024;15:6167. https://doi.org/10.1038/s41467-024-50420-y (PMID: 39039053)
Sprang M, Möllmann J, Andrade-Navarro MA, Fontaine JF. Overlooked poor-quality patient samples in sequencing data impair reproducibility of published clinically relevant datasets. Genome Biology. 2024;25(1):222. https://doi.org/10.1186/s13059-024-03331-6 (PMID: 39152483)
Simoneau J, Scott MS. Current RNA-seq methodology reporting limits reproducibility. Briefings in Bioinformatics. 2021;22(1):140–145. https://doi.org/10.1093/bib/bbz124 (PMID: 31813948)
Corchete LA, Rojas EA, Alonso-López D, et al. Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis. Scientific Reports. 2020;10:19737. https://doi.org/10.1038/s41598-020-76881-x (PMID: 33184454)
Sun L, Guan Z, Wei S, Tan R, Li P, Yan L. Identification of long non-coding and messenger RNAs differentially expressed between primary and metastatic melanoma. Frontiers in Genetics. 2019;10:292. https://doi.org/10.3389/fgene.2019.00292 (PMID: 31024618)
Liu B, Hu X, Feng K, et al. Temporal single-cell tracing reveals clonal revival and expansion of precursor exhausted T cells during anti-PD-1 therapy in lung cancer. Nature Cancer. 2022;3(1):108–121. https://doi.org/10.1038/s43018-021-00292-8 (PMID: 35121991)
Arora R, Cao C, Kumar M, et al. Spatial transcriptomics reveals distinct and conserved tumor core and edge architectures that predict survival and targeted therapy response. Nature Communications. 2023;14(1):4529. https://doi.org/10.1038/s41467-023-40271-4 (PMID: 37596273)
Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21. https://doi.org/10.1093/bioinformatics/bts635 (PMID: 23104886)
Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923–930. https://doi.org/10.1093/bioinformatics/btt656 (PMID: 24227677)
Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16(5):284–287. https://doi.org/10.1089/omi.2011.0118 (PMID: 22455463)
Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–3048. https://doi.org/10.1093/bioinformatics/btw354 (PMID: 27312411)
Wang L, Wang S, Li W. RSeQC: quality control of RNA-seq experiments. Bioinformatics. 2012;28(16):2184–2185. https://doi.org/10.1093/bioinformatics/bts356 (PMID: 22743226)
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–3100. https://doi.org/10.1093/bioinformatics/bty191 (PMID: 29750242)
Pardo-Palacios FJ, Arzalluz-Luque Á, Kondratova L, et al. SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nature Methods. 2024;21(5):793–797. https://doi.org/10.1038/s41592-024-02229-2 (PMID: 38509328)
Palla G, Spitzer H, Klein M, et al. Squidpy: a scalable framework for spatial omics analysis. Nature Methods. 2022;19(2):171–178. https://doi.org/10.1038/s41592-021-01358-2 (PMID: 35102346)
Deshpande D, Chhugani K, Chang Y, et al. RNA-seq data science: from raw data to effective interpretation. Frontiers in Genetics. 2023;14:997383. https://doi.org/10.3389/fgene.2023.997383 (PMID: 36999049)
10x Genomics. Cell Ranger and Space Ranger release notes (2024). https://www.10xgenomics.com/support/software/cell-ranger/downloads

Individual services

Deep-dive pages for specific transcriptomics methods and workflows.

Let's Talk About Your Science

Tell us:

• Your biological question
• Data type and size
• Timeline constraints

We'll tell you:

• What's feasible
• How long it will take
• Exactly what it will cost

Transcriptomics Analysis Services — Version-Pinned Workflows Across Bulk, Single-Cell, Spatial, and Long-Read Data

Key facts

What is transcriptomics?

What transcriptomics analysis can answer

Services included in this category

What Pepkio delivers

Processed data

Figures and tables

Code and documentation

Support

How the analysis works — step by step

1. Scope study design and select modality

2. Validate inputs and record metadata

3. Run quality control and preprocessing

4. Quantify expression by modality

5. Audit sample and batch integrity

6. Model differential expression or cell states

7. Run functional and cell-type interpretation

8. Generate publication figures and package deliverables

9. Deliver results and support reviewer requests

Tools and standards we use

Common challenges — and how we handle them

Common questions

Related services

Let's Talk About Your Science