Transcriptomics

Single-Cell RNA Sequencing (scRNA-seq) Analysis Service — Cell-Level Resolution from Raw FASTQs to Annotated UMAP

Single-cell RNA sequencing (scRNA-seq) profiles expression per cell to resolve heterogeneous tissues and rare populations bulk RNA-seq averages away (Luecken & Theis, 2019). Pepkio delivers version-pinned analysis from FASTQs to annotated UMAPs, with full support for custom inputs, outputs, and non-standard workflows. For academic, biotech, and pharma clients on 10x Chromium at ≥20,000 read pairs per cell (10x Genomics, 2024); documented scripts, figures, and a Methods draft included.

Key facts

Key facts about Single-Cell RNA-seq
Fact	Value
Supported platforms / instruments	Primary: 10x Genomics Chromium (3′ v3.1/v4, 5′ v2/v3, GEM-X, Flex). BD Rhapsody WTA, Parse Biosciences Evercode WT, and plate-based SMART-seq2/3 on request
Input requirements	≥20,000 read pairs/cell (3′/5′ standard); ≥10,000 read pairs/cell (Flex); 500–20,000 cells recovered per 10x library; >90% viability recommended (10x Genomics, 2024)
Reference builds supported	Human GRCh38-2024-A (GENCODE v44 / Ensembl 110); mouse GRCm39-2024-A (GENCODE vM33 / Ensembl 110); custom references on request
Primary tools (with versions)	Cell Ranger 10.0.0; Scanpy 1.12.1; Seurat 5.2.1; scvi-tools 1.4.3; SoupX 1.6.2; scDblFinder 1.18.0; SingleR 2.12.0; harmonypy 1.2.3
Typical turnaround time	3–5 weeks (standard single-cohort project); 5–8 weeks (multi-cohort integration or >100,000 cells) — confirmed at kickoff
Deliverable formats	.h5ad, .rds, .csv matrices; PDF/SVG figures; HTML QC report; documented R/Python scripts; Methods draft
Key cited best-practice reference	Luecken & Theis (2019), Molecular Systems Biology; Heumos et al. (2023), Nature Reviews Genetics
Custom / bespoke analysis	Non-standard inputs, outputs, and methods scoped at kickoff—e.g., custom matrices, client-specified figures/tables, trajectory or cross-modality extensions

What is single-cell RNA sequencing (scRNA-seq)?

scRNA-seq assigns sequenced transcripts to individual cell barcodes, producing a UMI count matrix rather than a tissue-level average. Droplet platforms such as 10x Chromium recover 500–10,000 cells per lane—up to ~20,000 with GEM-X (10x Genomics, 2024)—resolving rare cell populations that bulk RNA-seq cannot separate (Luecken & Theis, 2019). Reference atlases now exceed 100,000 cells per study; Tabula Sapiens profiled nearly 500,000 cells across 24 human tissues (Tabula Sapiens Consortium, 2022). Pepkio starts from FASTQs or count matrices and returns annotated objects with documented QC at each step. Projects can extend beyond the standard workflow: custom inputs, deliverable formats, and analyses are agreed at kickoff. See the scRNA-seq glossary.

When should you use single-cell RNA sequencing (scRNA-seq)?

scRNA-seq fits when variation lives at cell-type or cell-state resolution—heterogeneous tumors, inflamed tissue, developing organs, or immune repertoires. The table contrasts scRNA-seq with bulk RNA-seq and spatial transcriptomics.

Comparison of scRNA-seq, bulk RNA-seq, and spatial transcriptomics
Approach	Best for	Limitations	Approximate cost range
scRNA-seq (droplet)	Cell-type discovery, rare populations, trajectory inference, multi-condition composition shifts	Dissociation stress and ambient RNA artifacts; no native spatial context; higher per-cell cost than bulk	Library prep + sequencing and bioinformatics analysis vary widely by cell count, depth, and integration scope
Bulk RNA-seq	Condition contrasts (treated vs. control) when tissue is homogeneous or deconvolution is sufficient	Averages across cell types; rare populations below detection	Lower per-sample cost than scRNA-seq for modest cohorts
Spatial transcriptomics (e.g., 10x Visium, Xenium)	Tissue architecture, cell–cell niches, region-specific expression	Lower depth per cell; platform-specific capture; sectioning constraints	Higher per-study cost than dissociated scRNA-seq

COVID-19 lung immunology: Liao et al. (2020) found severe COVID-19 bronchoalveolar lavage enriched for proinflammatory macrophages, while moderate cases had clonally expanded CD8⁺ T cells.
Pulmonary fibrosis: Aran et al. (2019) identified a transitional profibrotic macrophage in mouse lung whose human orthologues are upregulated in idiopathic pulmonary fibrosis.
Cross-tissue immune architecture: Tabula Sapiens (2022) profiled 24 tissues from shared donors, enabling clonal T-cell tracking across organs.

How the analysis works — step by step

1. Validate inputs and sample metadata
Pepkio confirms FASTQ integrity, read structure (Read 1: 28 bp barcode/UMI; Read 2: ≥90 bp cDNA for 10x 3′), and metadata. Chemistry, expected recovery, depth, and covariates are recorded in sample_manifest.csv. Sub-threshold depth is flagged before alignment (10x Genomics, 2024).
Tools and outputs
Tools used: fastqc / fastp as needed
Output: sample_manifest.csv with library IDs, chemistry, read counts, and QC flags
2. Align reads and generate count matrices
For 10x data, Pepkio runs cellranger count or cellranger multi with GRCh38-2024-A or GRCm39-2024-A references (10x Genomics, 2024). Saturation, median genes per cell, and fraction reads in cells are compared against vendor expected ranges.
Tools and outputs
Tools used: Cell Ranger 10.0.0
Output: filtered_feature_bc_matrix/, raw_feature_bc_matrix/, metrics_summary.csv, web_summary.html
3. Import and audit Cell Ranger outputs
Count matrices are imported preserving raw UMI counts in a dedicated layer. Pepkio audits cell calling, saturation curves, and gene detection distributions. Non-10x count matrices (BD Rhapsody, Parse Evercode, SMART-seq2) are imported via anndata or Seurat::CreateSeuratObject when provided.
Tools and outputs
Tools used: Scanpy 1.12.1 or Seurat 5.2.1
Output: Per-sample .h5ad or .rds with counts layer and initial metadata
4. Correct ambient RNA
Cell-free RNA contaminates droplet matrices and can misassign marker genes (Young & Behjati, 2020; Heumos et al., 2023). When raw and filtered Cell Ranger matrices are available, Pepkio estimates contamination with SoupX and produces background-corrected counts. Elevated estimated soup fractions are flagged for review before clustering.
Tools and outputs
Tools used: SoupX 1.6.2
Output: Corrected count matrix; per-sample soup_fraction in metadata; SoupX diagnostic plots
5. Detect and flag doublets
Shared barcodes create hybrid transcriptomes that distort clustering (Wolock et al., 2019). Pepkio runs scDblFinder or Scrublet per sample—not on merged objects (Germain et al., 2022; Heumos et al., 2023). Expected multiplet rates are ~0.8% per 1,000 cells on Next GEM and ~0.4% on GEM-X (10x Genomics, 2024). Predicted doublets are flagged in metadata.
Tools and outputs
Tools used: scDblFinder 1.18.0 or Scrublet 0.2.3
Output: doublet_score, predicted_doublet columns; doublet score histograms
6. Filter low-quality cells
Cells with extreme mitochondrial fractions, low gene complexity, or empty-droplet profiles are removed using sample-adaptive thresholds, because optimal QC boundaries vary by tissue and dissociation protocol (Luecken & Theis, 2019). Retained and excluded counts are documented per filter rule.
Tools and outputs
Tools used: Scanpy 1.12.1 or Seurat 5.2.1
Output: Filtered object; QC plots for nCount_RNA, nFeature_RNA, percent.mt, percent.ribo
7. Normalize and select highly variable genes
For R workflows, Pepkio applies SCTransform v2, modeling sequencing depth and returning Pearson residuals for PCA (Hafemeister & Satija, 2019). Python workflows use sc.pp.normalize_total and HVG selection with the seurat_v3 flavor (Wolf et al., 2018). HVG sets and parameters are recorded for reproducibility.
Tools and outputs
Tools used: sctransform 0.4.3 (via Seurat 5.2.1) or Scanpy 1.12.1
Output: Normalized layers; highly_variable_genes.csv
8. Integrate batches across samples
Harmony integrates same-modality batches with shared cell types (Korsunsky et al., 2019). scVI handles atlas-level integration where compositional differences confound linear methods (Lopez et al., 2018; Gayoso et al., 2022; Luecken et al., 2022). Marker-gene preservation checks that biological states are not over-merged.
Tools and outputs
Tools used: harmonypy 1.2.3 or scvi-tools 1.4.3
Output: X_harmony or X_scVI embedding; before/after UMAP by batch and condition
9. Cluster, embed, and annotate cell types
Pepkio builds a neighbor graph, runs Leiden community detection at data-driven resolution (validated with marker genes), and computes UMAP. Cell types are assigned by reference mapping with SingleR (Aran et al., 2019), followed by manual marker review. Ambiguous clusters receive provisional labels with supporting evidence.
Tools and outputs
Tools used: Scanpy 1.12.1 or Seurat 5.2.1; SingleR 2.12.0; leidenalg 0.10.2
Output: Cluster assignments; UMAP/t-SNE plots; marker_gene_table.csv
10. Test differential expression and package deliverables
Cluster-wise or condition-wise DE uses Wilcoxon rank-sum tests with Benjamini–Hochberg FDR correction (Luecken & Theis, 2019). Seurat workflows use the presto implementation on large objects when installed (Hao et al., 2023). Results export as ranked gene lists with log₂ fold-change, detection rate, and adjusted p-values. Pseudotime trajectory analysis is scoped separately when requested.
Tools and outputs
Tools used: Scanpy rank_genes_groups or Seurat FindMarkers with presto
Output: deg_by_cluster.csv; volcano plots; final .h5ad/.rds; scripts; Methods draft

What Pepkio delivers

Processed data files

.h5ad, .rds, count matrices (.csv/MTX), Cell Ranger filtered_feature_bc_matrix.h5, metrics_summary.csv, and per-cell metadata (sample_id, batch, condition, QC metrics, cluster, cell_type).

Figures (PDF/SVG)

QC plots, SoupX/doublet diagnostics, UMAP/t-SNE (cluster, cell type, sample, batch), marker heatmaps, composition bars, DE volcano plots.

Tables

cell_metadata.csv, sample_qc_summary.csv, marker_gene_table.csv, deg_by_cluster.csv.

Code

Standalone, commented R and Python scripts per analysis stage
Environment lock files: sessionInfo(), conda env export, or pip freeze
Delivery via private Git repository or agreed file transfer

Documentation

HTML/PDF QC report with thresholds and exclusion counts
README with reproduction instructions from raw FASTQs
Journal-formatted Methods draft citing exact software versions
Custom or bespoke analysis milestones beyond the standard pipeline, with inputs, outputs, and methods defined at kickoff
Post-delivery reviewer support: clarification of methods and minor revisions within agreed scope (typically ≤20% of deliverables)

Technical decisions we make — and why

Normalization: SCTransform v2 (R) or Pearson residuals (Python): Removes depth confounding while preserving biological variance (Hafemeister & Satija, 2019).
Ambient RNA: SoupX before filtering: When raw and filtered matrices exist; CellBender on request (Young & Behjati, 2020).
Doublets: scDblFinder per sample: Not global UMI cutoffs (Germain et al., 2022; Luecken & Theis, 2019).
Batch correction: Harmony within study; scVI across datasets: Compositional differences favor scVI (Korsunsky et al., 2019; Gayoso et al., 2022; Luecken et al., 2022).
DE: Wilcoxon + BH-FDR: MAST on request for model-based contrasts (Luecken & Theis, 2019).

Common questions

What is the minimum number of cells and sequencing depth for scRNA-seq analysis?

For standard 10x Chromium 3′ or 5′ libraries, Pepkio recommends ≥20,000 read pairs per cell (10x Genomics, 2024). Chromium Flex requires ≥10,000 read pairs per cell. Hundreds of high-quality cells per biological condition support stable clustering; fewer cells can be analyzed but power for rare populations drops. Exact targets are confirmed at kickoff.

Can you analyze low-viability or low-yield samples?

Yes, with caveats documented in the QC report. Samples below recommended viability often show elevated mitochondrial fractions and stress genes (FOS, JUN) that can affect clustering (Luecken & Theis, 2019). Cells with very low UMI counts are typically excluded. We discuss re-sequencing or pooling before committing to full downstream analysis.

Do you support 10x Chromium, BD Rhapsody, and Parse Evercode data?

Yes. 10x Chromium 3′, 5′, GEM-X, and Flex are processed via Cell Ranger 10.0.0. BD Rhapsody, Parse Evercode, and SMART-seq2/3 count matrices can be imported into Scanpy or Seurat on request, with barcode and feature handling matched to the platform chemistry.

How long does scRNA-seq analysis take at Pepkio?

A standard single-cohort project (roughly 4–8 samples, one tissue, no cross-study integration) typically completes in 3–5 weeks from data receipt. Multi-cohort integration, atlas-scale datasets (>100,000 cells), or CITE-seq extensions may take 5–8 weeks. Weekly milestone check-ins; exact timelines confirmed at kickoff.

How do you handle batch effects across patients or sequencing runs?

Harmony corrects within-study technical batches when shared cell types are present (Korsunsky et al., 2019). Cross-study integration uses scVI (Lopez et al., 2018; Luecken et al., 2022). Donor, age, and sex covariates stay in metadata and are stratified or regressed per design. Integration quality is validated by marker-gene preservation.

Do I own the code — and in what format is it delivered?

Yes — you retain full ownership of all code, scripts, and results. Pepkio delivers commented R/Python scripts and environment lock files (sessionInfo(), conda, or pip). Objects use standard .h5ad and .rds formats readable in Scanpy or Seurat; R Markdown or Jupyter delivery is available on request.

Can I be involved during analysis?

Yes. Checkpoint reviews occur after QC, clustering, and before final delivery. You can review annotations, adjust cluster resolution, and request contrasts within agreed scope. A PhD-level scientific contact leads the project and incorporates your tissue-specific knowledge.

What does post-delivery reviewer support include?

Support covers clarification of computational methods, QC thresholds, and minor figure or table revisions within agreed scope (typically ≤20% of deliverables). Pepkio drafts Methods and Supplementary text for analyses we performed. Substantial new analyses requested by reviewers are scoped separately.

Is co-authorship required?

No. Pepkio operates as a fee-for-service provider and does not require co-authorship unless explicitly discussed in advance. Standard practice is acknowledgment of bioinformatics support in the Acknowledgments section; co-authorship is considered only when Pepkio scientists make substantial intellectual contributions beyond routine analysis.

Should I use Cell Ranger or an open-source aligner (STARsolo, kallisto|bustools)?

For 10x data, Pepkio defaults to Cell Ranger for chemistry-specific barcode handling and vendor QC metrics (10x Genomics, 2024). STARsolo and kallisto|bustools suit non-10x chemistries or open-source requirements. The Methods draft states aligner and reference build used.

How do you detect and remove doublets — and what doublet rate should I expect?

Pepkio detects doublets with scDblFinder or Scrublet per sample after ambient RNA correction (Germain et al., 2022; Wolock et al., 2019). Expected multiplet rates: ~0.8% per 1,000 cells on Next GEM, ~0.4% on GEM-X (10x Genomics, 2024)—scaling to roughly 5–8% at 10,000 recovered cells before removal. Predicted doublets are flagged; exclusion counts are documented.

Can you integrate my scRNA-seq with existing bulk RNA-seq or spatial data?

Cross-modality integration—pseudobulk cluster aggregation, deconvolution validation, or label transfer to Visium/Xenium spots—is available as a separately scoped milestone (Heumos et al., 2023). Scope, timeline, and deliverables are defined at kickoff based on your reference datasets and biological questions.

Can you handle custom or non-standard scRNA-seq analyses?

Yes. Beyond the standard FASTQ-to-UMAP workflow, Pepkio scopes bespoke work at kickoff—custom inputs (e.g., preprocessed matrices), output formats, uncommon methods, or analyses outside typical clustering and DE. Milestone pricing and timelines are confirmed before work begins.

Related services

Bulk RNA-seq — Condition-level differential expression when tissue homogeneity makes cell-level resolution unnecessary.
Spatial transcriptomics — Retain tissue architecture and microenvironment context that dissociated scRNA-seq loses.
Long-read RNA-seq — Isoform-level resolution for splice variants not captured by 3′ UMI counting.
Multi-omics integration — Joint analysis of scRNA-seq with CITE-seq, scATAC-seq, or proteomics from matched samples.
Custom consulting — Experimental design before library prep: cell loading targets and multiplexing strategy.

References

Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Molecular Systems Biology. 2019;15(6):e8746. https://doi.org/10.15252/msb.20188746 (PMID: 31217225)
Heumos L, Schaar AC, Lance C, et al. Best practices for single-cell analysis across modalities. Nature Reviews Genetics. 2023;24(8):550–572. https://doi.org/10.1038/s41576-023-00586-w
Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biology. 2018;19(1):15. https://doi.org/10.1186/s13059-017-1382-0
Hao Y, Hao S, Andersen-Nissen E, et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nature Biotechnology. 2023;41(4):552–560. https://doi.org/10.1038/s41587-023-01767-y
Hafemeister C, Satija R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biology. 2019;20(1):296. https://doi.org/10.1186/s13059-019-1874-1
Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nature Methods. 2018;15(12):1053–1058. https://doi.org/10.1038/s41592-018-0229-2
Gayoso A, Lopez R, Xing G, et al. A Python library for probabilistic analysis of single-cell omics data. Nature Biotechnology. 2022;40(2):163–166. https://doi.org/10.1038/s41587-021-01206-w
Korsunsky I, Millard N, Fan J, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nature Methods. 2019;16(12):1289–1296. https://doi.org/10.1038/s41592-019-0619-0
Young MD, Behjati S. SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. GigaScience. 2020;9(12):giaa151. https://doi.org/10.1093/gigascience/giaa151
Germain P-L, Lun A, Garcia Meixide C, et al. Doublet identification in single-cell sequencing data using scDblFinder. F1000Research. 2022;10:979. https://doi.org/10.12688/f1000research.73600.2
Wolock SL, Lopez R, Klein AM. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Systems. 2019;8(4):281–291.e9. https://doi.org/10.1016/j.cels.2018.11.005
Aran D, Looney AP, Liu L, et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nature Immunology. 2019;20(2):163–172. https://doi.org/10.1038/s41590-018-0276-y
10x Genomics. Sequencing Handbook (CG000809 Rev A). 2024. https://cdn.10xgenomics.com/image/upload/v1743440506/support-documents/CG000809_SequencingHandbook_RevA.pdf
10x Genomics. Cell Ranger downloads and release notes (v10.0.0). https://www.10xgenomics.com/support/software/cell-ranger/downloads
Tabula Sapiens Consortium. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science. 2022;376(6594):eabl4896. https://doi.org/10.1126/science.abl4896
Liao M, Liu Y, Yuan J, et al. Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nature Medicine. 2020;26(6):842–844. https://doi.org/10.1038/s41591-020-0901-9
Luecken MD, Büttner M, Chaichoompu K, et al. Benchmarking atlas-level data integration in single-cell genomics. Nature Methods. 2022;19(1):41–50. https://doi.org/10.1038/s41592-021-01336-8
10x Genomics. Chromium Next GEM Single Cell 3′ v3.1 reagent workflow and data overview (CG000204 Rev D). https://assets.ctfassets.net/an68im79xiti/1eX2FPdpeCgnCJtw4fj9Hx/7cb84edaa9eca04b607f9193162994de/CG000204_ChromiumNextGEMSingleCell3_v3.1_Rev_D.pdf
10x Genomics. Reference release notes (GRCh38-2024-A, GRCm39-2024-A). https://www.10xgenomics.com/support/software/cell-ranger/latest/release-notes/cr-reference-release-notes

Let's Talk About Your Science

Tell us:

• Your biological question
• Data type and size
• Timeline constraints

We'll tell you:

• What's feasible
• How long it will take
• Exactly what it will cost

Single-Cell RNA Sequencing (scRNA-seq) Analysis Service — Cell-Level Resolution from Raw FASTQs to Annotated UMAP

Key facts

What is single-cell RNA sequencing (scRNA-seq)?

When should you use single-cell RNA sequencing (scRNA-seq)?

How the analysis works — step by step

1. Validate inputs and sample metadata

2. Align reads and generate count matrices

3. Import and audit Cell Ranger outputs

4. Correct ambient RNA

5. Detect and flag doublets

6. Filter low-quality cells

7. Normalize and select highly variable genes

8. Integrate batches across samples

9. Cluster, embed, and annotate cell types

10. Test differential expression and package deliverables