Based on the k-mer model for protein sequence, a novel k-mer natural vector method is proposed to characterize the features of k-mers in a protein sequence, in which the numbers and distributions of k-mers are considered. It is proved that the relationship between a protein sequence and its k-mer natural vector is one-to-one. Phylogenetic analysis of protein sequences therefore can be easily performed without requiring evolutionary models or human intervention. In addition, there exists no a criterion to choose a suitable k, and k has a great influence on obtaining results as well as computational complexity. In this paper, a compound k-mer natural vector is utilized to quantify each protein sequence. The results gotten from phylogenetic analysis on three protein datasets demonstrate that our new method can precisely describe the evolutionary relationships of proteins, and greatly heighten the computing efficiency.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.ygeno.2018.08.010 | DOI Listing |
Front Plant Sci
October 2024
School of Pharmaceutical Sciences, Yunnan Key Laboratory of Pharmacology for Natural Products, and Yunnan College of Modern Biomedical Industry, Kunming Medical University, Kunming, Yunnan, China.
Nat Biomed Eng
November 2024
Medical Research Center of Genomic Medicine Institute, Seoul National University College of Medicine, Seoul, Republic of Korea.
When used to edit genomes, Cas9 nucleases produce targeted double-strand breaks in DNA. Subsequent DNA-repair pathways can induce large genomic deletions (larger than 100 bp), which constrains the applicability of genome editing. Here we show that Cas9-mediated double-strand breaks induce large deletions at varying frequencies in cancer cell lines, human embryonic stem cells and human primary T cells, and that most deletions are produced by two repair pathways: end resection and DNA-polymerase theta-mediated end joining.
View Article and Find Full Text PDFGene
October 2023
Pritzker Laboratory for Molecular Systematics and Evolution, Field Museum of Natural History, Chicago, Illinois, USA.
The Lemon shark Negaprion brevirostris is an important species experiencing conservation issues that is in need of genomic resources. Herein, we conducted a genome survey sequencing in N. brevirostris and determined genome size, explored repetitive elements, assembled and annotated the 45S rRNA DNA operon and mitochondrial genome.
View Article and Find Full Text PDFBioinformatics
November 2024
School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
Summary: Estimating genome size using k-mer frequencies, which plays a fundamental role in designing genome sequencing and analysis projects, has remained challenging for polyploid species, i.e., ploidy p > 2.
View Article and Find Full Text PDFBMC Genomics
October 2024
Yunnan Key Laboratory of Plateau Wetland Conservation, Restoration and Ecological Services, National Plateau Wetlands Research Center, Dianchi Lake Ecosystem Observation and Research Station of Yunnan Province, Southwest Forestry University, Kunming, 650224, PR, China.
BACKGROUND : Garuga Roxb. is a genus endemic to southwest China and other tropical regions in Southeast Asia facing risk of extinction due to the loss of tropical forests and changes in land use. Conducting a genome survey of G.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!