[DOI]
10.5524/102736
[Title]
Supporting data for "Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing."
[Release Date]
2025-08-06
[Citation]
Chang JJ; Yang X; Teng H; Reames B; Corbin V; Coin LJ (2025): Supporting data for "Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing." GigaScience Database. https://dx.doi.org/10.5524/102736
[Dataset Type]
Software, Bioinformatics
[Dataset Summary]
Polyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean tail-length estimates which lie within 12% of the correct value. Overall, Dorado is recommended as the preferred approach due to its relatively fast run times, low coefficient of variation and ease of use with integration with base-calling.
[File name] - [File Description] - [File Location]
readme_102736.txt - - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/readme_102736.txt
all_boostnano_R1_tails.csv - PolyA tail lengths as found by Boostnano for all R1 sequins; underlying data for figure 1 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/all_boostnano_R1_tails.csv
all_boostnano_R2_tails.csv - PolyA tail lengths as found by Boostnano for all R2 sequins; underlying data for figure 1 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/all_boostnano_R2_tails.csv
all_dorado_053_R1_tails.csv - PolyA tail lengths as found by Dorado version 0.5.3 for all R1 sequins; underlying data for figure 1 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/all_dorado_053_R1_tails.csv
all_dorado_053_R2_tails.csv - PolyA tail lengths as found by Dorado version 0.5.3 for all R2 sequins; underlying data for figure 1 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/all_dorado_053_R2_tails.csv
all_nanopolish_R1_tails.tsv - PolyA tail lengths as found by Nanopolish for all R1 sequins; underlying data for figure 1 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/all_nanopolish_R1_tails.tsv
all_nanopolish_R2_tails.tsv - PolyA tail lengths as found by Nanopolish for all R2 sequins; underlying data for figure 1 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/all_nanopolish_R2_tails.tsv
all_tailfindr_R1_tails.csv - PolyA tail lengths as found by Tailfindr for all R1 sequins; underlying data for figure 1 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/all_tailfindr_R1_tails.csv
all_tailfindr_R2_tails.csv - PolyA tail lengths as found by Tailfindr for all R2 sequins; underlying data for figure 1 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/all_tailfindr_R2_tails.csv
boostnano_no_dorado_R1_tails.csv - PolyA tail lengths as found by Boostnano for R1 sequins which were filtered out by Dorado but kept by Boostnano; underlying data for figure 3 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/boostnano_no_dorado_R1_tails.csv
boostnano_no_dorado_R2_tails.csv - PolyA tail lengths as found by Boostnano for R2 sequins which were filtered out by Dorado but kept by Boostnano; underlying data for figure 3 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/boostnano_no_dorado_R2_tails.csv
sequins_caco_24hpi_0.fast5 - Raw data for sequins in Caco cells at 24hpi - 1 of 2 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_caco_24hpi_0.fast5
sequins_caco_24hpi_1.fast5 - Raw data for sequins in Caco cells at 24hpi - 2 of 2 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_caco_24hpi_1.fast5
sequins_caco_48hpi_0.fast5 - Raw data for sequins in Caco cells at 48hpi - 1 of 4 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_caco_48hpi_0.fast5
sequins_caco_48hpi_1.fast5 - Raw data for sequins in Caco cells at 48hpi - 2 of 4 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_caco_48hpi_1.fast5
sequins_caco_48hpi_2.fast5 - Raw data for sequins in Caco cells at 48hpi - 3 of 4 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_caco_48hpi_2.fast5
sequins_caco_48hpi_3.fast5 - Raw data for sequins in Caco cells at 48hpi - 4 of 4 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_caco_48hpi_3.fast5
sequins_calu_24hpi_0.fast5 - Raw data for sequins in Calu cells at 24hpi - 1 of 3 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_calu_24hpi_0.fast5
sequins_calu_24hpi_1.fast5 - Raw data for sequins in Calu cells at 24hpi - 2 of 3 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_calu_24hpi_1.fast5
sequins_calu_24hpi_2.fast5 - Raw data for sequins in Calu cells at 24hpi - 3 of 3 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_calu_24hpi_2.fast5
sequins_calu_48hpi_0.fast5 - Raw data for sequins in Calu cells at 48hpi - 1 of 2 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_calu_48hpi_0.fast5
sequins_calu_48hpi_1.fast5 - Raw data for sequins in Calu cells at 48hpi - 2 of 2 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_calu_48hpi_1.fast5
sequins_vero_24hpi_0.fast5 - Raw data for sequins in Vero cells at 24hpi - 1 of 3 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_vero_24hpi_0.fast5
sequins_vero_24hpi_1.fast5 - Raw data for sequins in Vero cells at 24hpi - 2 of 3 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_vero_24hpi_1.fast5
sequins_vero_24hpi_2.fast5 - Raw data for sequins in Vero cells at 24hpi - 3 of 3 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_vero_24hpi_2.fast5
sequins_vero_2hpi_0.fast5 - Raw data for sequins in Vero cells at 2hpi - 1 of 2 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_vero_2hpi_0.fast5
sequins_vero_2hpi_1.fast5 - Raw data for sequins in Vero cells at 2hpi - 2 of 2 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_vero_2hpi_1.fast5
sequins_vero_48hpi_0.fast5 - Raw data for sequins in Vero cells at 48hpi - 1 of 2 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_vero_48hpi_0.fast5
sequins_vero_48hpi_1.fast5 - Raw data for sequins in Vero cells at 48hpi - 2 of 2 - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/sequins_vero_48hpi_1.fast5
vero_2hpi_sequins_boostnano_short_tails_10.bam - Alignment for sequins from Vero cell 2hpi whose tails were predicted to be less than 10bases ling by Boostnano; underlying data for figure 2B,2C & 2D - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/vero_2hpi_sequins_boostnano_short_tails_10.bam
vero_2hpi_sequins_sorted.bam - Alignment for sequins from Vero cell 2hpi; underlying data for figure 2A - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/vero_2hpi_sequins_sorted.bam
BoostNano-master.zip - Archival copy of the GitHub repository https://github.com/haotianteng/BoostNano downloaded 18-July-2025. BoostNano, a tool for preprocessing ONT-Nanopore RNA sequencing reads.This project is licensed under the MPL 2.0 license. Please refer to the GitHub repo for most recent updates. - https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip
[License]
All files and data are distributed under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication (https://creativecommons.org/publicdomain/zero/1.0/), unless specifically stated otherwise, see http://gigadb.org/site/term for more details.
[Comments]
[End]