Phylogenetic analysis of protein sequences based on a novel k-mer natural vector method.

Genomics

Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China. Electronic address:

Published: December 2019

Based on the k-mer model for protein sequence, a novel k-mer natural vector method is proposed to characterize the features of k-mers in a protein sequence, in which the numbers and distributions of k-mers are considered. It is proved that the relationship between a protein sequence and its k-mer natural vector is one-to-one. Phylogenetic analysis of protein sequences therefore can be easily performed without requiring evolutionary models or human intervention. In addition, there exists no a criterion to choose a suitable k, and k has a great influence on obtaining results as well as computational complexity. In this paper, a compound k-mer natural vector is utilized to quantify each protein sequence. The results gotten from phylogenetic analysis on three protein datasets demonstrate that our new method can precisely describe the evolutionary relationships of proteins, and greatly heighten the computing efficiency.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ygeno.2018.08.010DOI Listing

Publication Analysis

Top Keywords

k-mer natural
16
natural vector
16
protein sequence
16
phylogenetic analysis
12
analysis protein
8
protein sequences
8
novel k-mer
8
vector method
8
protein
7
k-mer
5

Similar Publications

First genome assembly and characterization of .

Front Plant Sci

October 2024

School of Pharmaceutical Sciences, Yunnan Key Laboratory of Pharmacology for Natural Products, and Yunnan College of Modern Biomedical Industry, Kunming Medical University, Kunming, Yunnan, China.

Article Synopsis
  • - Kalm ex L. is an evergreen shrub valuable for its methyl salicylate content and ornamental and medicinal properties, but lacks comprehensive genomic data, prompting this study.
  • - Researchers conducted high-throughput sequencing to assemble the genome, obtaining 417 Mb of data with high quality (47.94 Gb) and identifying over 26,000 protein-coding genes and numerous SSRs (simple sequence repeats).
  • - The study also identified thousands of transcription factors, transcription regulators, and protein kinases, while performing phylogenetic analyses to gain insights into the genetic relationships among species.
View Article and Find Full Text PDF

When used to edit genomes, Cas9 nucleases produce targeted double-strand breaks in DNA. Subsequent DNA-repair pathways can induce large genomic deletions (larger than 100 bp), which constrains the applicability of genome editing. Here we show that Cas9-mediated double-strand breaks induce large deletions at varying frequencies in cancer cell lines, human embryonic stem cells and human primary T cells, and that most deletions are produced by two repair pathways: end resection and DNA-polymerase theta-mediated end joining.

View Article and Find Full Text PDF

The Lemon shark Negaprion brevirostris is an important species experiencing conservation issues that is in need of genomic resources. Herein, we conducted a genome survey sequencing in N. brevirostris and determined genome size, explored repetitive elements, assembled and annotated the 45S rRNA DNA operon and mitochondrial genome.

View Article and Find Full Text PDF

findGSEP: estimating genome size of polyploid species using k-mer frequencies.

Bioinformatics

November 2024

School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.

Summary: Estimating genome size using k-mer frequencies, which plays a fundamental role in designing genome sequencing and analysis projects, has remained challenging for polyploid species, i.e., ploidy p > 2.

View Article and Find Full Text PDF

A tip of the iceberg: genome survey indicated a complex evolutionary history of Garuga Roxb. species.

BMC Genomics

October 2024

Yunnan Key Laboratory of Plateau Wetland Conservation, Restoration and Ecological Services, National Plateau Wetlands Research Center, Dianchi Lake Ecosystem Observation and Research Station of Yunnan Province, Southwest Forestry University, Kunming, 650224, PR, China.

BACKGROUND : Garuga Roxb. is a genus endemic to southwest China and other tropical regions in Southeast Asia facing risk of extinction due to the loss of tropical forests and changes in land use. Conducting a genome survey of G.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!