Calling large indels in 1047 Arabidopsis with IndelEnsembler.

Nucleic Acids Res

National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.

Published: November 2021

Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler's deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565333PMC
http://dx.doi.org/10.1093/nar/gkab904DOI Listing

Publication Analysis

Top Keywords

large indels
28
1047 arabidopsis
12
large indel
12
large
9
indels 1047
8
indel dataset
8
ground truth
8
compared athcnv
8
arabidopsis genome
8
182-bp deletion
8

Similar Publications

Background: Chronic traumatic encephalopathy (CTE) is a neurodegenerative disease associated with repetitive head impact (RHI) although little is known about its molecular pathogenesis. Previous studies of single neurons showed that private somatic mutations increase both during normal aging and in neurodegenerative disorders, and show diverse mutational patterns.

Method: We applied two orthogonal single-nucleus whole-genome sequencing (snWGS) methods to neurons isolated from the prefrontal cortex of 15 individuals with CTE, and 4 individuals with RHI but no CTE diagnosis, and compared mutational rates and spectra with neurons from neurotypical controls and Alzheimer's disease (AD).

View Article and Find Full Text PDF

Genomic Patterns are Associated with Different Sequelae of Patients with Long-Term COVID-19.

Adv Sci (Weinh)

December 2024

State Key Laboratory for Diagnosis and Treatment of Severe Zoonotic Infectious Diseases, Key Laboratory of Pathobiology Ministry of Education, China-Japan Union Hospital of Jilin University, Changchun, 130033, China.

In the post-large era, various COVID-19 sequelae are getting more and more attention to health problems. Although the mortality rate of the COVID-19 infection is now declining, it is often accompanied by new clinical sequelae with different symptoms such as fatigue after infection, loss of smell. The degree of age, gender, virus infection seems to be weakly correlated with clinical symptoms.

View Article and Find Full Text PDF

Comparison of genotyping assays for detection of targeted CRISPR/Cas mutagenesis in highly polyploid sugarcane.

Front Genome Ed

December 2024

Agronomy Department, Plant Molecular and Cellular Biology Program, Genetics Institute, University of Florida, IFAS-Institute of Food and Agricultural Science, Gainesville, FL, United States.

Sugarcane ( spp.) is an important biofuel feedstock and a leading source of global table sugar. hybrid cultivars are highly polyploid (2n = 100-130), containing large numbers of functionally redundant hom(e)ologs in their genomes.

View Article and Find Full Text PDF

Driver mutation landscape of acute myeloid leukemia provides insights for neoantigen-based immunotherapy.

Cancer Lett

December 2024

Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China. Electronic address:

Acute myeloid leukemia (AML) has lagged in benefiting from immunotherapies, primarily due to the scarcity of actionable AML-specific antigens. Driver mutations represent promising immunogenic targets, but a comprehensive characterization of the AML neoantigen landscape and their impact on patient outcomes and the AML immune microenvironment remain unclear. Herein, we conducted matched DNA and RNA sequencing on 304 AML patients and extensively integrated data from additional ∼2,500 AML cases, identifying 49 driver genes, notably characterized by a significant proportion of insertions and deletions (indels).

View Article and Find Full Text PDF

Intra-host variability of SARS-CoV-2: Patterns, causes and impact on COVID-19.

Virology

December 2024

Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Avenida Rivadavia 1917, C1083ACA Ciudad Autónoma de Buenos Aires, Argentina; Laboratorio de Virología y Genética Molecular (LVGM), Facultad de Ciencias Naturales y Ciencias de la Salud, Universidad Nacional de la Patagonia San Juan Bosco, Belgrano 160, Trelew, CP, 9100, Argentina. Electronic address:

Intra-host viral variability is related to pathogenicity, persistence, drug resistance, and the emergence of new clades. This work reviews the large amount of data on SARS-CoV-2 intra-host variability accumulated to date, addressing known and potential implications in COVID-19 and the emergence of VOCs and lineage-defining mutations. Topics covered include the distribution of intra-host polymorphisms across the genome, the corresponding mutational signatures, their patterns of emergence and extinction throughout infection, and the processes governing their abundance, frequency, and type (synonymous, nonsynonymous, indels, nonsense).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!