Population scale sequencing of whole human genomes is becoming economically feasible; however, data management and analysis remains a formidable challenge for many research groups. Large sequencing studies, like the 1000 Genomes Project, have improved our understanding of human demography and the effect of rare genetic variation in disease. Variant calling on datasets of hundreds or thousands of genomes is time-consuming, expensive, and not easily reproducible given the myriad components of a variant calling pipeline. Here, we describe a cloud-based pipeline for joint variant calling in large samples using the Real Time Genomics population caller. We deployed the population caller on the Amazon cloud with the DNAnexus platform in order to achieve low-cost variant calling. Using our pipeline, we were able to identify 68.3 million variants in 2,535 samples from Phase 3 of the 1000 Genomes Project. By performing the variant calling in a parallel manner, the data was processed within 5 days at a compute cost of $7.33 per sample (a total cost of $18,590 for completed jobs and $21,805 for all jobs). Analysis of cost dependence and running time on the data size suggests that, given near linear scalability, cloud computing can be a cheap and efficient platform for analyzing even larger sequencing studies in the future.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4482534 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0129277 | PLOS |
BMC Bioinformatics
January 2025
Auburn University, Auburn, AL, 36849, USA.
Background: Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently, the accuracy and quality value estimation provided by HiFi technology are more than sufficient for applications such as genome assembly and germline variant calling.
View Article and Find Full Text PDFAm J Hum Genet
January 2025
Division of Biostatistics, Data Science Institute, Medical College of Wisconsin, Milwaukee, WI, USA; Cancer Center, Medical College of Wisconsin, Milwaukee, WI, USA. Electronic address:
Mosaic loss of Y (mLOY) is the most common somatic chromosomal alteration detected in human blood. The presence of mLOY is associated with altered blood cell counts and increased risk of Alzheimer disease, solid tumors, and other age-related diseases. We sought to gain a better understanding of genetic drivers and associated phenotypes of mLOY through analyses of whole-genome sequencing (WGS) of a large set of genetically diverse males from the Trans-Omics for Precision Medicine (TOPMed) program.
View Article and Find Full Text PDFPrimary ciliary dyskinesia (PCD, OMIM 244400) is a rare genetic disorder that affects motile cilia and is characterised by impaired mucociliary clearance of the airway epithelium, which results in chronic upper and lower airway infections. While short-read next-generation sequencing technology has been used for the genetic testing of PCD, its effectiveness is limited in identifying variants in the gene because of the nearly identical pseudogene As we confirmed that the gene was not expressed in airway cells, we obtained nasal mucosa biopsy specimens for total RNA sequencing (RNA-seq) with library enrichment using exome oligos. Among the 34 nasal samples from patients suspected of having PCD, three aberrant splicing patterns in were identified in two samples.
View Article and Find Full Text PDFbioRxiv
January 2025
Department of Computer Science, School of Computing and Data Science, University of Hong Kong, Hong Kong, China.
Variant calling using long-read RNA sequencing (lrRNA-seq) can be applied to diverse tasks, such as capturing full-length isoforms and gene expression profiling. It poses challenges, however, due to higher error rates than DNA data, the complexities of transcript diversity, RNA editing events, etc. In this paper, we propose Clair3-RNA, the first deep learning-based variant caller tailored for lrRNA-seq data.
View Article and Find Full Text PDFAtheroscler Plus
March 2025
Department of Laboratory Medicine, Faculty of Medicine and Health, Örebro University, Örebro, Sweden.
Background And Aims: Familial hypercholesterolemia (FH) and other disorders with similar features are common genetic disorders that remain underdiagnosed and undertreated, due in part to the cost of screening. The aim of this study was to design and implement a whole gene targeted NGS panel for the molecular diagnosis of FH and statin intolerance with an emphasis on high quality variant calling, including copy number analysis.
Methods: A whole gene panel for hybridisation-based short read NGS was designed for the dominant FH-genes low density lipoprotein receptor (), apolipoprotein B (APOB), proproteinconvertas subtilisin/kexin type 9 (PCSK9), apolipoprotein E (APOE) and the recessive FH-genes low density lipoprotein receptor adaptor protein 1 (), ATP binding cassette subfamily member 5/8 (ABCG5/8) and lipase A, lysosomal acid type (), as well as solute carrier organic anion transporter family member 1B1 (), not an FH gene but linked to statin intolerance.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!