Variant calling using long-read RNA sequencing (lrRNA-seq) can be applied to diverse tasks, such as capturing full-length isoforms and gene expression profiling. It poses challenges, however, due to higher error rates than DNA data, the complexities of transcript diversity, RNA editing events, etc. In this paper, we propose Clair3-RNA, the first deep learning-based variant caller tailored for lrRNA-seq data.
View Article and Find Full Text PDFThe sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets.
View Article and Find Full Text PDFRare diseases are collectively common, affecting approximately one in twenty individuals worldwide. In recent years, rapid progress has been made in rare disease diagnostics due to advances in DNA sequencing, development of new computational and experimental approaches to prioritize genes and genetic variants, and increased global exchange of clinical and genetic data. However, more than half of individuals suspected to have a rare disease lack a genetic diagnosis.
View Article and Find Full Text PDFStructural variants (SVs) drive gene expression in the human brain and are causative of many neurological conditions. However, most existing genetic studies have been based on short-read sequencing methods, which capture fewer than half of the SVs present in any one individual. Long-read sequencing (LRS) enhances our ability to detect disease-associated and functionally relevant structural variants (SVs); however, its application in large-scale genomic studies has been limited by challenges in sample preparation and high costs.
View Article and Find Full Text PDFBackground: MECP2 Duplication Syndrome, also known as X-linked intellectual developmental disorder Lubs type (MRXSL; MIM: 300260), is a neurodevelopmental disorder caused by copy number gains spanning MECP2. Despite varying genomic rearrangement structures, including duplications and triplications, and a wide range of duplication sizes, no clear correlation exists between DNA rearrangement and clinical features. We had previously demonstrated that up to 38% of MRXSL families are characterized by complex genomic rearrangements (CGRs) of intermediate complexity (2 ≤ copy number variant breakpoints < 5), yet the impact of these genomic structures on regulation of gene expression and phenotypic manifestations have not been investigated.
View Article and Find Full Text PDF