Transcriptomics

Long-Read RNA Sequencing Analysis Service — Full-Length Isoform Catalogs from HiFi or Nanopore Reads to Filtered GTF

Long-read RNA sequencing resolves full-length isoform structures that short-read RNA-seq cannot reconstruct from fragments (Amarasinghe et al., 2020). Pepkio delivers version-pinned analysis from FASTQ or BAM to SQANTI3-filtered GTF catalogs, with custom inputs and workflows scoped at kickoff. For academic, biotech, and pharma clients using PacBio Kinnex at 5–10 million HiFi reads per sample (Pacific Biosciences, 2024); scripts, figures, and a Methods draft included.

Key facts

Key facts about Long-Read RNA-seq
Fact	Value
Supported platforms / instruments	Primary: PacBio Revio / Sequel II (Kinnex Iso-Seq HiFi). Oxford Nanopore PromethION / MinION (direct RNA, amplification-free cDNA, PCR-cDNA). Single-cell long-read (MAS-ISO-seq, 10x-compatible cDNA) on request
Input requirements	≥300 ng total RNA, RIN >7, DNase-treated; ≥3 biological replicates recommended for differential expression; 5M HiFi reads/sample (high-abundance isoform discovery), 10M (moderate-to-rare isoforms or isoform-level DE) for PacBio Kinnex (Pacific Biosciences, 2024; UCSD gCore, 2024)
Reference builds supported	Human GRCh38 + GENCODE v44; mouse GRCm39 + GENCODE vM33; custom references on request
Primary tools (with versions)	minimap2 2.31; IsoQuant 3.13.0; SQANTI3 6.0.1; isoseq3 (SMRT Link 13.1); NanoPlot 1.42.0; DESeq2 1.44.0; DRIMSeq 1.30.0; stageR 1.28.0
Typical turnaround time	4–6 weeks (standard bulk cohort, reference-guided); 6–10 weeks (annotation-free, multi-condition isoform DE, or orthogonal short-read validation) — confirmed at kickoff
Deliverable formats	.gtf/.gff3, .bam, count matrices (.tsv/.csv); PDF/SVG figures; HTML QC report; documented R/Python scripts; Methods draft
Key cited best-practice reference	Amarasinghe et al. (2020), Genome Biology; Pardo-Palacios et al. (2024), Nature Methods (SQANTI3); LRGASP Consortium (2024), Nature Methods
Custom / bespoke analysis	Non-standard inputs, outputs, and methods scoped at kickoff—e.g., pre-aligned BAM, client GTF, fusion prioritization, ORF prediction, short-read junction validation, single-cell long-read extensions

What is long-read RNA sequencing?

Long-read RNA sequencing maps individual RNA molecules spanning complete exon chains, enabling isoform reconstruction rather than inferring splice junctions from ~150 bp short-read fragments (Amarasinghe et al., 2020; Chen et al., 2025). Pepkio aligns PacBio HiFi or ONT reads with minimap2, builds transcript models with IsoQuant, and classifies isoforms as FSM, ISM, NIC, or NNC with SQANTI3 (Li, 2018; Prjibelski et al., 2023; Pardo-Palacios et al., 2024). SG-NEx profiled seven human cell lines across five RNA-seq protocols and reported that long-read sequencing more robustly identifies major isoforms than short-read cDNA sequencing (Chen et al., 2025). Custom deliverables beyond the standard workflow are scoped at kickoff. See the long-read RNA-seq glossary.

When should you use long-read RNA sequencing?

Long-read RNA-seq fits when the biological question requires isoform-resolved expression—alternative splicing in disease, annotation of uncharacterized loci, or fusion validation. The table contrasts lrRNA-seq with bulk short-read RNA-seq and short-read scRNA-seq.

Comparison of long-read RNA-seq, bulk RNA-seq, and single-cell RNA-seq
Approach	Best for	Limitations	Approximate cost range
Long-read RNA-seq (PacBio / ONT)	Full-length isoform discovery, novel transcripts, fusion validation, ORF and UTR annotation	Higher per-sample sequencing cost than bulk short-read; artifact-rich catalogs require curation; lower sample throughput	Library prep + sequencing and bioinformatics vary by platform, Kinnex multiplexing, and depth
Bulk short-read RNA-seq	Condition-level gene DE on well-annotated genomes	Cannot resolve overlapping isoforms from junction fragments alone	Lower per-sample cost for modest cohorts
Single-cell RNA-seq (short-read)	Cell-type heterogeneity and rare populations	3′ capture bias; no native full-length isoform resolution	Mid-range; per-cell cost exceeds bulk long-read

Aged human brain isoform diversity: Aguzzoli Heberle et al. (2024) mapped medically relevant isoform diversity in aged frontal cortex missed by short-read annotation.
Cross-platform benchmarking: Chen et al. (2025) profiled SG-NEx across seven cell lines and five protocols, showing long-read data more robustly identifies major isoforms than short-read cDNA.
Pipeline artifacts: Du et al. (2023) showed isoform-caller choice strongly affects false-positive novel isoform rates—supporting SQANTI3 curation before inference.

How the analysis works — step by step

1. Validate inputs and sample metadata
Pepkio confirms FASTQ or BAM integrity, platform, chemistry, and metadata. Read depth, RIN, replicates, and contrast design are recorded in sample_manifest.csv. Sub-threshold yield or truncated reads are flagged before alignment (Pacific Biosciences, 2024; Amarasinghe et al., 2020).
Tools and outputs
Tools used: fastqc / fastp as needed; samtools quickcheck for BAM inputs
Output: sample_manifest.csv with library IDs, platform, read counts, and QC flags
2. QC raw long reads
Read-length distributions, pass-filter rates, and full-length fractions are computed per sample. ONT reports include N50 read length and median quality; PacBio HiFi reports include mean read length and CCS pass rates. Truncation that inflates isoform catalogs is flagged (Amarasinghe et al., 2020; Pardo-Palacios et al., 2024).
Tools and outputs
Tools used: NanoPlot 1.42.0; pycoQC 2.5.2 (ONT); PacBio dataset reports (HiFi)
Output: read_qc_summary.csv; read-length histograms; per-sample QC flags
3. Process PacBio subreads to HiFi reads (when applicable)
When clients deliver PacBio subreads or CCS BAMs, Pepkio runs the Iso-Seq workflow—demultiplex, refine poly-A tails, and cluster to HiFi reads (Pacific Biosciences, 2024). Barcode crosstalk and low CCS yield are documented before alignment.
Tools and outputs
Tools used: isoseq3 (SMRT Link 13.1)
Output: Demultiplexed HiFi .fastq.gz per sample; isoseq_stats.csv
4. Align reads to the reference genome
Reads are mapped in splice-aware mode: minimap2 splice:hq for PacBio HiFi and splice with k-mer size 14 for ONT, with annotated GENCODE splice junctions supplied as BED input (Li, 2018; Prjibelski et al., 2023). Mapping rates, chimeric fractions, and primary vs. secondary alignments are audited per sample.
Tools and outputs
Tools used: minimap2 2.31; samtools 1.21
Output: Coordinate-sorted, indexed .bam per sample; alignment_summary.csv
5. Reconstruct and quantify transcript models
IsoQuant runs in reference-guided mode with GENCODE, extending the reference with sample-specific isoforms (Prjibelski et al., 2023). Gene-, isoform-, exon-, and intron-level counts are generated; saturation is compared to platform depth targets (Pacific Biosciences, 2024; Chen et al., 2025).
Tools and outputs
Tools used: IsoQuant 3.13.0
Output: extended_annotation.gtf; isoform_counts.tsv; gene_counts.tsv; saturation curves
6. Collapse redundant isoform models
Long-read pipelines often emit highly redundant transcript models differing by terminal exons or indels (Pardo-Palacios et al., 2024; ConesaLab SQANTI3 wiki). When redundancy exceeds project thresholds, Pepkio collapses near-identical models before SQANTI3 classification.
Tools and outputs
Tools used: TAMA collapse or cDNA_Cupcake collapse_isoforms_by_sam as appropriate
Output: collapsed_annotation.gtf; collapse audit log
7. Classify and filter with SQANTI3
SQANTI3 assigns structural categories (FSM, ISM, NIC, NNC, genic, intergenic, antisense) and quality metrics on TSS, TTS, and splice junctions (Pardo-Palacios et al., 2024). Rules-based filtering is default; ML filtering is documented when selected.
Tools and outputs
Tools used: SQANTI3 6.0.1
Output: SQANTI3_classification.txt; SQANTI3_filter_report.html; filtered corrected.gtf
8. Re-quantify the filtered transcript catalog
SQANTI3 expression estimates are used for QC only—not for differential testing (ConesaLab SQANTI3 wiki; Pardo-Palacios et al., 2024). Pepkio re-runs IsoQuant quantification against the SQANTI3-filtered GTF per sample to produce final count matrices.
Tools and outputs
Tools used: IsoQuant 3.13.0 (--reference with filtered GTF)
Output: filtered_isoform_counts.tsv; filtered_gene_counts.tsv; TPM tables
9. Test differential expression
Gene-level contrasts use DESeq2 with Benjamini–Hochberg FDR correction when ≥3 biological replicates per condition are available. Isoform-level testing with DRIMSeq and stageR is run only when replicate count and read depth support isoform-resolved power (Du et al., 2023; LRGASP Consortium, 2024; Nowicka & Robinson, 2016). Batch is included in the design matrix when the same contrast spans multiple sequencing runs (Love et al., 2014).
Tools and outputs
Tools used: DESeq2 1.44.0; DRIMSeq 1.30.0; stageR 1.28.0
Output: deg_results.csv; differential_isoform_usage.csv; MA and volcano plots
10. Package deliverables
Pepkio assembles figures, exports count tables, writes commented scripts, and drafts a Methods section citing software versions. Custom milestones are included when scoped at kickoff.
Tools and outputs
Tools used: R 4.4.x / Python 3.12 scripts; ggplot2 3.5.1
Output: Final deliverable bundle; HTML QC report; README; Methods draft

What Pepkio delivers

Processed data files

SQANTI3-filtered .gtf/.gff3; coordinate-sorted .bam + .bai; filtered_isoform_counts.tsv and filtered_gene_counts.tsv; SQANTI3_classification.txt; QC summaries.

Figures (PDF/SVG)

Read-length histograms; mapping-rate bar charts; gene/isoform saturation curves; SQANTI3 structural-category plots; MA and volcano plots when DE is in scope.

Tables

Key columns in SQANTI3_classification.txt (isoform, structural_category, associated_gene, coding, predicted_NMD); deg_results.csv (gene_id, log2FoldChange, padj); differential_isoform_usage.csv (isoform_id, gene_id, padj).

Code

Commented R and Python scripts per stage
Environment lock files (sessionInfo(), conda, or pip)
Delivery via private Git repository or agreed file transfer

Documentation

HTML/PDF QC report; README with reproduction instructions; Methods draft citing software versions and reference builds; post-delivery reviewer support for method clarification and minor revisions within agreed scope (typically ≤20% of deliverables).

Technical decisions we make — and why

Isoform reconstruction: IsoQuant (default) vs. FLAIR or Bambu: IsoQuant showed competitive F1-scores in LRGASP reference-guided benchmarks on annotated genomes (Prjibelski et al., 2023; LRGASP Consortium, 2024). FLAIR or Bambu on request when clients require a specific caller or comparison.
SQANTI3 filtering: rules-based (default) vs. ML filter: Rules-based filtering is interpretable and reproducible; ML filter when clients want more aggressive artifact removal (Pardo-Palacios et al., 2024).
Catalog build: pooled replicates vs. per-sample only: For discovery, Pepkio pools long-read samples to build a single experiment-level catalog, then re-quantifies per sample—matching SQANTI3 recommended workflow (ConesaLab SQANTI3 wiki).
Reference-guided vs. annotation-free: Reference-guided with GENCODE is default for human and mouse; annotation-free discovery for non-model organisms is scoped separately with orthogonal validation recommended (LRGASP Consortium, 2024).
Differential testing: gene-level DESeq2 vs. isoform-level DRIMSeq/stageR: Isoform-level testing only when replicates and depth support statistical power; otherwise gene-level DE with isoform catalog as descriptive output (Du et al., 2023).

Common questions

What is the minimum RNA input, RIN, and sequencing depth for long-read RNA-seq analysis?

Pepkio recommends ≥300 ng total RNA with RIN >7 and DNase treatment for PacBio Kinnex Iso-Seq libraries (Pacific Biosciences, 2024; UCSD gCore, 2024). PacBio recommends 5 million HiFi reads per sample for high-abundance isoform discovery and 10 million for moderate-to-rare isoforms or isoform-level differential expression (Pacific Biosciences, 2024). ONT depth is scaled using SG-NEx protocol benchmarks (Chen et al., 2025). ≥3 biological replicates per condition are recommended for DE. Exact targets are confirmed at kickoff.

Can you analyze low-yield or degraded RNA samples?

Yes, with limitations documented in the QC report. Samples below RIN 7 often produce truncated reads that inflate artifact isoforms in uncured catalogs (Amarasinghe et al., 2020). Pepkio flags elevated truncation and low full-length fractions before SQANTI3 filtering. We discuss re-extraction or re-sequencing when yield cannot support the planned contrasts.

Do you support PacBio Revio Kinnex, ONT direct RNA, and ONT cDNA libraries?

Yes. PacBio Revio and Sequel II Kinnex HiFi data are processed via isoseq3 (when needed) and minimap2 splice:hq. ONT direct RNA, amplification-free cDNA, and PCR-cDNA libraries use minimap2 splice with ONT-appropriate parameters (Li, 2018; Prjibelski et al., 2023; Chen et al., 2025). Library-type-specific QC metrics are reported in the deliverable bundle.

How long does long-read RNA-seq analysis take at Pepkio?

A standard bulk cohort (roughly 6–12 samples, reference-guided, one contrast) typically completes in 4–6 weeks from data receipt. Annotation-free discovery, multi-condition isoform DE, or orthogonal short-read junction validation may take 6–10 weeks. Exact timelines are confirmed at kickoff.

How do you handle batch effects across sequencing runs or library prep batches?

For gene-level DESeq2, batch is included as a covariate in the design matrix when the same biological contrast spans multiple sequencing runs (Love et al., 2014). Isoform-level DRIMSeq models include batch when replicate structure supports it. Pepkio inspects top DE genes and marker isoforms after fitting batch covariates to confirm condition-associated signal is retained before final delivery.

Do I own the code — and in what format is it delivered?

Yes — you retain full ownership of all code, scripts, and results. Pepkio delivers commented R/Python scripts and environment lock files (sessionInfo(), conda, or pip). Count matrices and GTF files use standard formats readable in IGV, tappAS, or Bioconductor; R Markdown or Jupyter delivery is available on request.

Can I be involved during analysis?

Yes. Checkpoint reviews occur after read QC, after SQANTI3 filtering, and before final delivery. You can review structural-category distributions, adjust filter stringency within agreed scope, and request additional contrasts. A PhD-level scientific contact leads the project and incorporates your tissue-specific knowledge.

What does post-delivery reviewer support include?

Support covers clarification of computational methods, SQANTI3 filter thresholds, and minor figure or table revisions within agreed scope (typically ≤20% of deliverables). Pepkio drafts Methods and Supplementary text for analyses we performed. Substantial new analyses requested by reviewers are scoped separately.

Is co-authorship required?

No. Pepkio operates as a fee-for-service provider and does not require co-authorship unless explicitly discussed in advance. Standard practice is acknowledgment of bioinformatics support in the Acknowledgments section; co-authorship is considered only when Pepkio scientists make substantial intellectual contributions beyond routine analysis.

How many reads do I need to discover novel isoforms?

PacBio recommends 5 million HiFi reads per sample for high-abundance isoform discovery and 10 million for moderate-to-rare transcripts on Kinnex libraries (Pacific Biosciences, 2024). Downsampling of heart and brain Kinnex data suggests ~80% of known genes and isoforms may be detectable at 10–20 million reads per sample (Pacific Biosciences, 2024, ESHG poster). LRGASP advises additional replicates and orthogonal data when the goal is rare or novel transcript detection (LRGASP Consortium, 2024).

How do you produce count matrices and validate novel isoforms after SQANTI3 filtering?

SQANTI3 read counts and TPM values are for structural QC only—not for statistical testing (ConesaLab SQANTI3 wiki; Pardo-Palacios et al., 2024). Pepkio re-runs IsoQuant against the SQANTI3-filtered GTF per sample to produce DESeq2- and DRIMSeq-ready count matrices. When matched short-read BAMs are available, SQANTI3 --short_reads input adds junction validation metrics (Pardo-Palacios et al., 2024). RT-PCR primer design for experimental validation is scoped separately.

Can you handle custom or non-standard long-read RNA-seq analyses?

Yes. Beyond the standard FASTQ-to-filtered-GTF workflow, Pepkio scopes bespoke work at kickoff—custom inputs (pre-aligned BAM, client GTF), fusion prioritization, ORF prediction, allele-specific isoform phasing, single-cell long-read (MAS-ISO-seq), or integration with short-read or spatial data (Amarasinghe et al., 2020). Milestone pricing and timelines are confirmed before work begins.

Related services

Bulk RNA-seq — Gene-level differential expression when isoform resolution is unnecessary.
Single-cell RNA-seq — Cell-type heterogeneity without native full-length isoform capture.
Spatial transcriptomics — Tissue architecture and region-specific splicing context.
Whole-genome sequencing — Genomic variants and structural events that can affect splicing and transcript structure.
Custom consulting — Pre-sequencing depth, Kinnex multiplexing, and replicate design before library prep.

References

Amarasinghe SL, Su S, Dong X, et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biology. 2020;21(1):30. https://doi.org/10.1186/s13059-020-1935-5 (PMID: 32033565)
Pardo-Palacios FJ, Arzalluz-Luque Á, Kondratova L, et al. SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nature Methods. 2024;21(5):793–797. https://doi.org/10.1038/s41592-024-02229-2 (PMID: 38509328)
Prjibelski AD, Mikheenko A, Joglekar A, et al. Accurate isoform discovery with IsoQuant using long reads. Nature Biotechnology. 2023;41(7):915–918. https://doi.org/10.1038/s41587-022-01565-y (PMID: 36593406)
Pardo-Palacios FJ, Wang D, Reese F, et al.; LRGASP Consortium. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. Nature Methods. 2024;21(7):1349–1363. https://doi.org/10.1038/s41592-024-02298-3 (PMID: 38849569)
Chen Y, Davidson NM, Wan YK, et al. A systematic benchmark of Nanopore long-read RNA sequencing for transcript-level analysis in human cell lines. Nature Methods. 2025;22(4):801–812. https://doi.org/10.1038/s41592-025-02623-4 (PMID: 40082608)
Du MRM, Gouil Q, Kawaji H, et al. Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures. Nature Methods. 2023;20(12):1810–1821. https://doi.org/10.1038/s41592-023-02026-3 (PMID: 37783886)
Aguzzoli Heberle B, Li H, Pardo-Palacios FJ, et al. Mapping medically relevant RNA isoform diversity in the aged human frontal cortex with deep long-read RNA-seq. Nature Biotechnology. 2024;42(11):1614–1622. https://doi.org/10.1038/s41587-024-02245-9 (PMID: 38778214)
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–3100. https://doi.org/10.1093/bioinformatics/bty191 (PMID: 29750242)
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014;15(12):550. https://doi.org/10.1186/s13059-014-0550-8 (PMID: 25516281)
Nowicka A, Robinson MD. DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Research. 2016;5:1356. https://doi.org/10.12688/f1000research.8930.1
Van den Berge K, Soneson C, Robinson MD, Clement L. stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage. Genome Biology. 2017;18(1):151. https://doi.org/10.1186/s13059-017-1277-0 (PMID: 28784146)
Pacific Biosciences. Application note: Kinnex full-length RNA kit for isoform sequencing. 2024. https://www.pacb.com/wp-content/uploads/Application-note-Kinnex-full-length-RNA-kit-for-isoform-sequencing.pdf
UCSD gCore Genomics Core. PacBio Revio bulk RNA Iso-Seq sample submission guidelines. 2024. https://gcore.ucsd.edu/RevioSub
ConesaLab. SQANTI3 wiki: Introduction to SQANTI3. https://github.com/ConesaLab/SQANTI3/wiki/Introduction-to-SQANTI3
ConesaLab. SQANTI3 releases (v6.0.1). https://github.com/ConesaLab/SQANTI3/releases/tag/v6.0.1
ablab. IsoQuant releases (v3.13.0). https://github.com/ablab/IsoQuant/releases/tag/v3.13.0
Pacific Biosciences. Assessment of read depth requirements for gene and isoform discovery (ESHG 2024 poster). 2024. https://www.pacb.com/wp-content/uploads/2024-eshg-RNA-isoform-human-heart-brain-short-and-long-read-sequencing-poster.pdf

Let's Talk About Your Science

Tell us:

• Your biological question
• Data type and size
• Timeline constraints

We'll tell you:

• What's feasible
• How long it will take
• Exactly what it will cost

Long-Read RNA Sequencing Analysis Service — Full-Length Isoform Catalogs from HiFi or Nanopore Reads to Filtered GTF

Key facts

What is long-read RNA sequencing?

When should you use long-read RNA sequencing?

How the analysis works — step by step

1. Validate inputs and sample metadata

2. QC raw long reads

3. Process PacBio subreads to HiFi reads (when applicable)

4. Align reads to the reference genome

5. Reconstruct and quantify transcript models

6. Collapse redundant isoform models

7. Classify and filter with SQANTI3

8. Re-quantify the filtered transcript catalog

9. Test differential expression

10. Package deliverables