Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different indel calling results. UPS-indel identifies 15% redundant indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to state-of-the-art approaches for indel call set comparison demonstrates its clear superiority in finding common indels among call sets. UPS-indel is theoretically proven to find all equivalent indels, and thus exhaustive.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5658412 | PMC |
http://dx.doi.org/10.1038/s41598-017-14400-1 | DOI Listing |
Nucleic Acids Res
September 2024
Genome Integrity & Structural Biology Laboratory, NIH/NIEHS, DHHS, Research Triangle Park, NC 27709, USA.
The endonuclease activity of Pms1 directs mismatch repair by generating a nick in the newly replicated DNA strand. Inactivating Pms2, the human homologue of yeast Pms1, increases the chances of colorectal and uterine cancers. Here we use whole genome sequencing to show that loss of this endonuclease activity, via the pms1-DE variant, results in strong mutator effects throughout the Saccharomyces cerevisiae genome.
View Article and Find Full Text PDFComput Struct Biotechnol J
December 2024
Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea.
High-yield production of therapeutic protein using Chinese hamster ovary (CHO) cells requires stable cell line development (CLD). CLD typically uses random integration of transgenes; however, this results in clonal variation and subsequent laborious clone screening. Therefore, site-specific integration of a protein expression cassette into a desired chromosomal locus showing high transcriptional activity and stability, referred to as a hot spot, is emerging.
View Article and Find Full Text PDFJ Mol Biol
September 2024
The Jackson Laboratory, Bar Harbor, ME, USA. Electronic address:
The Mouse Variation Registry (MVAR) resource is a scalable registry of mouse single nucleotide variants and small indels and variant annotation. The resource accepts data in standard Variant Call Format (VCF) and assesses the uniqueness of the submitted variants via a canonicalization process. Novel variants are assigned a unique, persistent MVAR identifier; variants that are equivalent to an existing variant in the resource are associated with the existing identifier.
View Article and Find Full Text PDFBMC Genomics
February 2024
Laboratory of Transcriptional Regulation, Institute of Molecular Genetics of the Czech Academy of Sciences, Vídeňská 1083, Prague, 142 20, Czech Republic.
Background: Whole exome sequencing (WES) and whole genome sequencing (WGS) have become standard methods in human clinical diagnostics as well as in population genomics (POPGEN). Blood-derived genomic DNA (gDNA) is routinely used in the clinical environment. Conversely, many POPGEN studies and commercial tests benefit from easy saliva sampling.
View Article and Find Full Text PDFForensic Sci Int
May 2023
Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, 510515 Guangzhou, China; Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China; Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, Xi'an, China. Electronic address:
The insertion/deletion (InDel) polymorphism has promising applications in forensic DNA analysis. However, the insufficient forensic efficiencies of the present InDel-based systems restrict their applications in parentage testing, due to the lower genetic polymorphism of the biallelic InDel locus and the limited number of InDel loci in a multiplex amplification system. Here, we introduced an in-house developed system which contained 41 polymorphic Multi-InDel markers (equivalent to 82 InDels in total), to serve as an efficient and reliable tool for different forensic applications in the Manchu and Mongolian groups.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!