[DOI] 10.5524/100726 [Title] Supporting data for "The Gene-Rich Genome of the Scallop Pecten maximus" [Release Date] 2020-03-19 [Citation] Dudchenko, O; Aiden, EL; Kenny, NJ; James, K; Betteridge, E; Corton, C; Dolucan, J; Mead, D; Oliver, K; Omer, A; Pelan, S; Ryan, Y; Sims, Y; Skelton, J; Smith, M; Torrance, J; Weisz, D; Wipat, A; Howe, K; Williams, ST (2020): Supporting data for "The Gene-Rich Genome of the Scallop Pecten maximus" GigaScience Database. https://dx.doi.org/10.5524/100726 [Data Type] Genomic [Data Summary] The King Scallop, Pecten maximus, is distributed in shallow waters along the Atlantic coast of Europe. It forms the basis of a valuable commercial fishery and its ubiquity means that it plays a key role in coastal ecosystems and food webs. Like other filter feeding bivalves it can accumulate potent phytotoxins, to which it has evolved some immunity. The molecular origins of this immunity are of interest to evolutionary biologists, pharmaceutical companies and fisheries management.
Here we report the genome sequencing of this species, conducted as part of the Wellcome Sanger 25 Genomes Project. This genome was assembled from PacBio reads and scaffolded with 10x Chromium and Hi-C data, and its 3,983 scaffolds have an N50 of 44.8 Mb (longest scaffold 60.1 Mb), with 92% of the assembly sequence contained in 19 scaffolds, corresponding to the 19 chromosomes found in this species. The total assembly spans 918.3 Mb, and is the best-scaffolded marine bivalve genome published to date, exhibiting 95.5% recovery of the metazoan BUSCO set. Gene annotation resulted in 67,741 gene models. Analysis of gene content revealed large numbers of gene duplicates, as previously seen in bivalves, with little gene loss, in comparison with the sequenced genomes of other marine bivalve species.
The genome assembly of Pecten maximus and its annotated gene set provide a high-quality platform for a wide range of investigations, including studies on such disparate topics as shell biomineralization, pigmentation, vision and resistance to algal toxins. As a result of our findings we highlight the sodium channel gene Nav1, known as a gene conferring resistance to saxitoxin and tetrodotoxin, as a candidate for further studies investigating immunity to domoic acid. [File Location] https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/100001_101000/100726/ [File name] - [File Description] Pecten_maximus_high_confidence_peptides.fa.gz - Amino acid sequence of filtered gene models - Final gene set PecMax_hardmasked_genome.fa.gz - Hard masked ('N') repeats version of genome Trinity_scallop_novel_transcriptome.fa.gz - Novel transcriptome assembly PecMax_softmasked_genome.fa.gz - Soft masked (lower case) repeats version of genome NanoCompPacBioReads.zip - Read quality assessment, NanoComp (manuscript Supplementary File 1) xPecMax.fa.gz - Genome assembly AdditionalBlobplotFiles.zip - Additional Blobplot plots and data, including those separated by phylum/superkingdom. (manuscript Supplementary File 2) multiqc_report_10x_reads.html - Read quality assessment, FastQC Pecten_maximus_all_blasthits.txt - BLAST annotations, Pecten maximus gene models. (manuscript Supplementary File 4) ReadsPerGene.zip - ReadsPerGene files output by STAR. Zipped text file (manuscript Supplementary File 3) Pecten_maximus_all_blasthits.txt - Best blast hit of all gene models - no filtering of gene models KEGG-KAAS-annotations.txt - KEGG-KAAS annotations, Pecten maximus gene models. (manuscript Supplementary File 5) Pecten_maximus_genome.gtf.gz - GTF annotation file of all gene models - no filtering of gene models Pecten_maximus_genome.gff3.gz - GFF3 annotation file of all gene models - no filtering of gene models Pecten_maximus_high_confidence_genes.gtf.gz - GTF annotation of filtered gene models - Final gene set Pecten_maximus_all_gene_cds_nucl.fa.gz - CDS sequence of all gene models - no filtering of gene models Pecten_maximus_high_confidence_cds.fa.gz - CDS sequence of filtered gene models - Final gene set readme_100726.txt - Pecten_maximus_all_peptides.fa.gz - Amino acid sequence of all gene models - no filtering of gene models [License] All files and data are distributed under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication (https://creativecommons.org/publicdomain/zero/1.0/), unless specifically stated otherwise, see http://gigadb.org/site/term for more details. [Comments] [End]