Background: False duplications in genome assemblies lead to false biological conclusions. We quantified false duplications in popularly used previous genome assemblies for platypus, zebra finch, and Anna's Hummingbird, and their new counterparts of the same species generated by the Vertebrate Genomes Project, of which the Vertebrate Genomes Project pipeline attempted to eliminate false duplications through haplotype phasing and purging. These assemblies are among the first generated by the Vertebrate Genomes Project where there was a prior chromosomal level reference assembly to compare with.
Results: Whole genome alignments revealed that 4 to 16% of the sequences are falsely duplicated in the previous assemblies, impacting hundreds to thousands of genes. These lead to overestimated gene family expansions. The main source of the false duplications is heterotype duplications, where the haplotype sequences were relatively more divergent than other parts of the genome leading the assembly algorithms to classify them as separate genes or genomic regions. A minor source is sequencing errors. Ancient ATP nucleotide binding gene families have a higher prevalence of false duplications compared to other gene families. Although present in a smaller proportion, we observe false duplications remaining in the Vertebrate Genomes Project assemblies that can be identified and purged.
Conclusions: This study highlights the need for more advanced assembly methods that better separate haplotypes and sequence errors, and the need for cautious analyses on gene gains.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9516828 | PMC |
http://dx.doi.org/10.1186/s13059-022-02764-1 | DOI Listing |
Int J Med Inform
January 2025
Centre for Primary Care and Public Health (Unisanté), University of Lausanne, Lausanne, Switzerland.
Background: Duplicate and near-duplicate medical documents are problematic in document management, clinical use, and medical research. In this study, we focus on multisourced medical documents in the context of a population-based cancer registry in Switzerland. Although the data collection process is well-regulated, the volume of transmitted documents steadily increases and the presence of full or near-duplicates slows down and complicates document processing.
View Article and Find Full Text PDFJ Pharm Biomed Anal
March 2025
Departamento de Farmácia, Faculdade de Ciências Farmacêuticas, Universidade de São Paulo, Av. Prof. Lineu Prestes, 580 - Bloco 15, SP, São Paulo CEP 05508-000, Brazil. Electronic address:
Measurement uncertainty is a critical factor in the reliability of pharmaceutical analyses, since it directly affects batch acceptance and regulatory compliance. While analytical uncertainty has been extensively studied, uncertainty arising from sampling remains less explored. This study aims to address this gap by evaluating the contributions of sampling and analytical uncertainties to the overall uncertainty for acetaminophen tablets and oral solution.
View Article and Find Full Text PDFSensors (Basel)
December 2024
School of Automation & Information Engineering, Sichuan University of Science & Engineering, Yibin 644000, China.
Lightweight object detection algorithms play a paramount role in unmanned aerial vehicles (UAVs) remote sensing. However, UAV remote sensing requires target detection algorithms to have higher inference speeds and greater accuracy in detection. At present, most lightweight object detection algorithms have achieved fast inference speed, but their detection precision is not satisfactory.
View Article and Find Full Text PDFZhonghua Yi Xue Yi Chuan Xue Za Zhi
December 2024
Center of Genetics and Prenatal Diagnosis, Department of Gynecology and Obstetrics, the First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China.
Objective: To summarize the results of prenatal diagnosis and outcome of pregnancy of fetuses with a high risk for 6p22.1.1-p21.
View Article and Find Full Text PDFCan J Physiol Pharmacol
December 2024
Department of Pharmaceutical Science, Taneja College of Pharmacy, University of South Florida, Tampa, FL 33613, USA.
The FDA Adverse Event Reporting System (FAERS) is a large-scale repository of reports concerning adverse drug events (ADEs). The same published clinical study or report may be reviewed by multiple companies or healthcare professionals and reported separately to the FDA, leading to a significant presence of duplicate reports in FAERS. These duplicate records can result in the identification of false associations between a given drug and an ADE.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!