Supporting data for "Leveraging multiple transcriptome assembly methods for improved gene structure annotation" =============================================================================================================== Venturini L; Caim S; Kaithakottil GG; Mapleson DL; Swarbreck D (2018): Supporting data for "Leveraging multiple transcriptome assembly methods for improved gene structure annotation" GigaScience Database. http://dx.doi.org/10.5524/100464 Summary: -------- The performance of RNA-Seq aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand. Here we show that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-Seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-specified metrics, while solving common artefacts such as erroneous transcript chimerisms. We have implemented this method in an open-source Python3 and Cython program, Mikado, available at https://github.com/lucventurini/Mikado. Files: ------ mikado-master.zip Archival copy of the GitHub repository https://github.com/lucventurini/mikado download 24-May-2018. Mikado - pick your transcript: a pipeline to determine and select the best RNA-Seq prediction mikado-analysis-master.zip Archival copy of the GitHub repository https://github.com/lucventurini/mikado-analysis download 24-May-2018. This repository contains the scripts used for the Mikado analyses. Input_assemblies_real_data.zip assemblies derived from real data download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Input_assemblies_simulated_data.zip assemblies derived from simulated data download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Input_assemblies_multiple_Isoform_Fraction.zip StringTie and CLASS2 assemblies derived by varying the Minimum Isoform Fraction parameter download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Input_assemblies_multiple_samples.zip assemblies derived from real data using multiple samples RNA-Seq of A. thaliana download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Input_assemblies_pacbio.zip alignments and assemblies of Illumina and PacBio reads download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Comparisons_real_and_simulated.zip comparisons for the real and simulated datasets download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Multiple_Isoform_Fractions_comparisons.zip comparisons for Stringtie/CLASS and derived Mikados obtained by varying the MIF parameter download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Comparisons_multiple_samples.zip Comparisons for the assemblies and combinations obtained by analysing multiple samples of A. thaliana download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Comparisons_pacbio.zip Comparisons for the analysis on human and Illumina and PacBio data (raw assemblies/alignments, EvidentialGene, and Mikado) download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Combiners_real_data.zip results of different combiners for different species using real data download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Combiners_simulated_data.zip results of different combiners for different species using simulated data download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Mikado_Multiple_Isoform_Fractions.zip results of using Mikado on the output of CLASS2 and Stringtie after varying their minimum isoform fraction setting download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Combiners_multiple_samples.zip results of different combiners and selectors on multiple RNA-Seq assemblies performed with the same tool, on twelve different samples of A. thaliana download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Multiple_samples_MAKER.zip results of running MAKER on the assemblies download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149 Combiners_pacbio.zip results of Mikado, naive combination and EvidentialGene on mixtures of Illumina and PacBio human data download 26-Jun-2018. https://figshare.com/projects/Leveraging_multiple_transcriptome_assembly_methods_for_improved_gene_structure_annotation/26149