[DOI] 10.5524/100712 [Title] Supporting data for "Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping and Hi-C" [Release Date] 2020-02-18 [Citation] Rosen, BD; Enosi Tuipulotu, D; Field, MA; Dudchenko, O; Chan, EK; Minoche, AE; Barton, K; Lyons, RJ; Edwards, RJ; Hayes, VM; Omer, A; Colaric, Z; Keilwagen, J; Skvortsova, K; Bogdanovic, O; Aiden, EL; Smith, TP; Zammit, RA; Smith, MA (2020): Supporting data for "Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping and Hi-C" GigaScience Database. https://dx.doi.org/10.5524/100712 [Data Type] Genomic,Epigenomic [Data Summary] The German Shepherd Dog (GSD) is one of the most common breeds on earth and has been bred for its utility and intelligence. It is often first choice for police and military work, as well as protection, disability assistance and search-and-rescue. Yet, GSD’s are well known to be afflicted with a range of genetic diseases that can interfere with their training. Such diseases are of particular concern when they occur later in life, and fully trained animals are not able to continue their duties. Here, we provide the draft genome sequence of a healthy German Shepherd female as a reference for future disease and evolutionary studies We generated this improved canid reference genome (CanFam_GSD) utilising a combination of Pacific Bioscience, Oxford Nanopore,, 10X Genomics, Bionano, and Hi-C technologies. The GSD assembly is approximately 80 times as contiguous as the current canid reference genome (20.9 Mb vs 0.267 Mb contig N50), containing far fewer gaps (306 vs 23,876) and fewer scaffolds (429 vs 3,310) than the current canid reference genome CanFam v3.1. Two chromosomes (4 and 35) are assembled into single scaffolds with no gaps. Benchmarking Universal Single-Copy Orthologs analyses of the genome assembly results show 93.0% of the conserved single-copy genes are complete in the GSD assembly compared to 92.2% for CanFam v3.1. Homology-based gene annotation increases this value to about 99%. Detailed examination of the evolutionary important pancreatic amylase region reveals there are most likely seven copies of the gene indicative of a duplication of four ancestral copies and the disruption of one copy. GSD genome assembly and annotation were produced with major improvement in completeness, continuity and quality over the existing canid reference. This resource will enable further research related to canine diseases, the evolutionary relationships of canids, and other aspects of canid biology. [File Location] https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/100001_101000/100712/ [File name] - [File Description] mummer_snps_indels.vcf - VCF file generated mummer show-snps EXP_REFINEFINAL1.cmap - Assembly consensus genome map set derived from RawMolecules.bnx using Bionano Solve v3.2.2. Nala_HYBRID_SCAFFOLD_NCBI_Nresized_merged_NOT_SCAFFOLDED_HiC.fasta.gz - Hybrid genome assembly fasta GSD_mitochondria.fna - Complete mitochondrial sequence MoleculeQualityReport.txt - A summary report of the RawMolecules.bnx data as produced by Bionano Solve v3.2.2. proteins.fasta - Coding gene translated fasta GSD_final_combined_assembly.fna - Final combined genome assembly fasta GCA_008641055.1_ASM864105v1_genomic.fna - Genome assembly fasta sample_metadata.txt - Tabular data with additional sample details rRNA_predictions.gff - rRNA annotations readme_100712.txt - busco_Nala.txt - BUSCO summary for assembly busco_CanFam3.1.txt - BUSCO summary for current dog reference genome CanFam3.1 mRNA_predictions.gff - Coding gene annotations cds.fasta - Coding gene nucleotides fasta exp_informaticsReport.txt - A summary report of EXP_REFINEFINAL1.cmap along with alignment statistics against an in silico digested genome map set of CanFam 3.1, using Bionano Solve v3.2.2. [License] All files and data are distributed under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication (https://creativecommons.org/publicdomain/zero/1.0/), unless specifically stated otherwise, see http://gigadb.org/site/term for more details. [Comments] [End]