Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples.

Data Brief

Haematology-Pathology Research Laboratory, Research Unit for Haematology and Research Unit for Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark.

Published: October 2021

Next-generation sequencing (NGS) of whole genomes has become more accessible to biomedical researchers as the sequencing price continues to drop, and more laboratories have NGS facilities or have access to a core facility. However, the rapid and robust development of practical bioinformatics pipelines partly depends on convenient access to data for the testing of algorithms. Publicly available data sets constitute a part of this strategy. Here, we provide a triplicate whole-genome paired-end sequencing data set, consisting of 1.38 billion raw sequencing reads derived from saliva DNA from a single anonymous male Caucasian donor, with the average sequencing depths aimed at 30x for two of the samples and 4x for a low-coverage sample. The raw number of single nucleotide variants were 3.3-4 million and the median variant read depth of GATK4-passed variants in three samples was 22, 18, and 10. 81% of all variants were found in two or three of the samples, whereas 19% were singletons. The karyotype was evaluated as 46,XY with no apparent copy-number variation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8427263PMC
http://dx.doi.org/10.1016/j.dib.2021.107349DOI Listing

Publication Analysis

Top Keywords

next-generation sequencing
8
sequencing data
8
caucasian donor
8
variants three
8
three samples
8
sequencing
6
replicate whole-genome
4
whole-genome next-generation
4
data
4
data derived
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!