Tuberculous meningitis is an infectious disease of the central nervous system caused by Mycobacterium tuberculosis (M. tuberculosis). It mainly involves the meninges and brain parenchyma, as well as the spinal cord and meninges; Disability and mortality rates are high.
View Article and Find Full Text PDFA comprehensive description of human genomes is essential for understanding human evolution and relationships between modern populations. However, most published literature focuses on local alignment comparison of several genes rather than the complete evolutionary record of individual genomes. Combining with data from the 1,000 Genomes Project, we successfully reconstructed 2,504 individual genomes and propose Divided Natural Vector method to analyze the distribution of nucleotides in the genomes.
View Article and Find Full Text PDFComput Struct Biotechnol J
July 2021
Understanding the relationships between genomic sequences is essential to the classification and characterization of living beings. The classes and characteristics of an organism can be identified in the corresponding genome space. In the genome space, the natural metric is important to describe the distribution of genomes.
View Article and Find Full Text PDFBackground: Begomoviruses are widely distributed and causing devastating diseases in many crops. According to the number of genomic components, a begomovirus is known as either monopartite or bipartite begomovirus. Both the monopartite and bipartite begomoviruses have the DNA-A component which encodes all essential proteins for virus functions, while the bipartite begomoviruses still contain the DNA-B component.
View Article and Find Full Text PDFChaos Game Representation (CGR) was first proposed to be an image representation method of DNA and have been extended to the case of other biological macromolecules. Compared with the CGR images of DNA, where DNA sequences are converted into a series of points in the unit square, the existing CGR images of protein are not so elegant in geometry and the implications of the distribution of points in the CGR image are not so obvious. In this study, by naturally distributing the twenty amino acids on the vertices of a regular dodecahedron, we introduce a novel three-dimensional image representation of protein sequences with CGR method.
View Article and Find Full Text PDFThe severe respiratory disease COVID-19 was initially reported in Wuhan, China, in December 2019, and spread into many provinces from Wuhan. The corresponding pathogen was soon identified as a novel coronavirus named SARS-CoV-2 (formerly, 2019-nCoV). As of 2 May, 2020, over 3 million COVID-19 cases had been confirmed, and 235,290 deaths had been reported globally, and the numbers are still increasing.
View Article and Find Full Text PDFAdvances in sequencing technology have made large amounts of biological data available. Evolutionary analysis of data such as DNA sequences is highly important in biological studies. As alignment methods are ineffective for analyzing large-scale data due to their inherently high costs, alignment-free methods have recently attracted attention in the field of bioinformatics.
View Article and Find Full Text PDFHIV-1 is the most common and pathogenic strain of human immunodeficiency virus consisting of many subtypes. To study the difference among HIV-1 subtypes in infection, diagnosis and drug design, it is important to identify HIV-1 subtypes from clinical HIV-1 samples. In this work, we propose an effective numeric representation called Subsequence Natural Vector (SNV) to encode HIV-1 sequences.
View Article and Find Full Text PDFUsing numerical methods for genome comparison has always been of importance in bioinformatics. The Chaos Game Representation (CGR) is an effective genome sequence mapping technology, which converts genome sequences to CGR images. To each CGR image, we associate a vector called an Extended Natural Vector (ENV).
View Article and Find Full Text PDFGenome comparison is a vital research area of bioinformatics. For large-scale genome comparisons, the Multiple Sequence Alignment (MSA) methods have been impractical to use due to its algorithmic complexity. In this study, we propose a novel alignment-free method based on the one-to-one correspondence between a DNA sequence and its complete central moment vector of the cumulative Fourier power and phase spectra.
View Article and Find Full Text PDFClassification of DNA sequences is an important issue in the bioinformatics study, yet most existing methods for phylogenetic analysis including Multiple Sequence Alignment (MSA) are time-consuming and computationally expensive. The alignment-free methods are popular nowadays, whereas the manual intervention in those methods usually decreases the accuracy. Also, the interactions among nucleotides are neglected in most methods.
View Article and Find Full Text PDFThis study quantitatively validates the principle that the biological properties associated with a given genotype are determined by the distribution of amino acids. In order to visualize this central law of molecular biology, each protein was represented by a point in 250-dimensional space based on its amino acid distribution. Proteins from the same family are found to cluster together, leading to the principle that the convex hull surrounding protein points from the same family do not intersect with the convex hulls of other protein families.
View Article and Find Full Text PDFAcute lung injury (ALI) and acute respiratory distress syndrome (ARDS) are the serious diseases that are characterized by a severe inflammatory response of lung injuries and damage to the microvascular permeability, frequently resulting in death. YiQiFuMai (YQFM) lyophilized injection powder is a redeveloped preparation based on the well-known traditional Chinese medicine formula Sheng-Mai-San which is widely used in clinical practice in China, mainly for the treatment of microcirculatory disturbance-related diseases. However, there is little information about its role in ALI/ARDS.
View Article and Find Full Text PDFAnalyzing phylogenetic relationships using mathematical methods has always been of importance in bioinformatics. Quantitative research may interpret the raw biological data in a precise way. Multiple Sequence Alignment (MSA) is used frequently to analyze biological evolutions, but is very time-consuming.
View Article and Find Full Text PDFEvol Bioinform Online
December 2017
We construct a virus database called VirusDB (http://yaulab.math.tsinghua.
View Article and Find Full Text PDFmarinus, one of the most abundant marine cyanobacteria in the global ocean, is classified into low-light (LL) and high-light (HL) adapted ecotypes. These two adapted ecotypes differ in their ecophysiological characteristics, especially whether adapted for growth at high-light or low-light intensities. However, some evolutionary relationships of phylogeny remain to be resolved, such as whether the strains SS120 and MIT9211 form a monophyletic group.
View Article and Find Full Text PDFClassification of protein are crucial topics in biology. The number of protein sequences stored in databases increases sharply in the past decade. Traditionally, comparison of protein sequences is usually carried out through multiple sequence alignment methods.
View Article and Find Full Text PDFZika virus (ZIKV) is a mosquito-borne flavivirus. It was first isolated from Uganda in 1947 and has become an emergent event since 2007. However, because of the inconsistency of alignment methods, the evolution of ZIKV remains poorly understood.
View Article and Find Full Text PDFDue to vast sequence divergence among different viral groups, sequence alignment is not directly applicable to genome-wide comparative analysis of viruses. More and more attention has been paid to alignment-free methods for whole genome comparison and phylogenetic tree reconstruction. Among alignment-free methods, the recently proposed "Natural Vector (NV) representation" has successfully been used to study the phylogeny of multi-segmented viruses based on a 12-dimensional genome space derived from the nucleotide sequence structure.
View Article and Find Full Text PDFThe free-living SAR11 clade is a globally abundant group of oceanic Alphaproteobacteria, with small genome sizes and rich genomic A+T content. However, the taxonomy of SAR11 has become controversial recently. Some researchers argue that the position of SAR11 is a sister group to Rickettsiales.
View Article and Find Full Text PDFComparing DNA or protein sequences plays an important role in the functional analysis of genomes. Despite many methods available for sequences comparison, few methods retain the information content of sequences. We propose a new approach, the Yau-Hausdorff method, which considers all translations and rotations when seeking the best match of graphical curves of DNA or protein sequences.
View Article and Find Full Text PDFAccording to the WHO, ebolaviruses have resulted in 8818 human deaths in West Africa as of January 2015. To better understand the evolutionary relationship of the ebolaviruses and infer virulence from the relationship, we applied the alignment-free natural vector method to classify the newest ebolaviruses. The dataset includes three new Guinea viruses as well as 99 viruses from Sierra Leone.
View Article and Find Full Text PDFWhat kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein.
View Article and Find Full Text PDFWe have recently developed a computational approach in a vector space for genome-based virus classification. This approach, called the "Natural Vector (NV) representation", which is an alignment-free method, allows us to classify single-segmented viruses with high speed and accuracy. For multiple-segmented viruses, typically phylogenetic trees of each segment are reconstructed for discovering viral phylogeny.
View Article and Find Full Text PDFIntron-containing and intronless genes have different biological properties and statistical characteristics. Here we propose a new computational method to distinguish between intron-containing and intronless gene sequences. Seven feature parameters α, β, γ, λ, θ, φ and σ based on detrended fluctuation analysis (DFA) are fully used, and thus we can compute a 7-dimensional feature vector for any given gene sequence to be discriminated.
View Article and Find Full Text PDF