Exploration of multivariate analysis in microbial coding sequence modeling.

Tahir Mehmood Jon Bohlin Anja Bråthen Kristoffersen Solve Sæbø Jonas Warringer Lars Snipen

BMC Bioinformatics

Biostatistics, Department of Chemistry, Biotechnology and Food Sciences, Norwegian University of Life Sciences, Aas, Norway.

Published: May 2012

Background: Gene finding is a complicated procedure that encapsulates algorithms for coding sequence modeling, identification of promoter regions, issues concerning overlapping genes and more. In the present study we focus on coding sequence modeling algorithms; that is, algorithms for identification and prediction of the actual coding sequences from genomic DNA. In this respect, we promote a novel multivariate method known as Canonical Powered Partial Least Squares (CPPLS) as an alternative to the commonly used Interpolated Markov model (IMM). Comparisons between the methods were performed on DNA, codon and protein sequences with highly conserved genes taken from several species with different genomic properties.

Results: The multivariate CPPLS approach classified coding sequence substantially better than the commonly used IMM on the same set of sequences. We also found that the use of CPPLS with codon representation gave significantly better classification results than both IMM with protein (p < 0.001) and with DNA (p < 0.001). Further, although the mean performance was similar, the variation of CPPLS performance on codon representation was significantly smaller than for IMM (p < 0.001).

Conclusions: The performance of coding sequence modeling can be substantially improved by using an algorithm based on the multivariate CPPLS method applied to codon or DNA frequencies.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3473301	PMC
http://dx.doi.org/10.1186/1471-2105-13-97	DOI Listing

Publication Analysis

Top Keywords

coding sequence

sequence modeling

multivariate cppls

codon representation

coding

sequence

cppls

exploration multivariate

multivariate analysis

analysis microbial

Similar Publications

Complete genome sequence of Pseudarthrobacter sp. NIBRBAC000502770 from coal mine of Hongcheon on Republic of Korea.

BMC Genom Data

January 2025

Department of Applied Biosciences, College of Agriculture and Life Sciences, Kyungpook National University, Daegu, 41566, Republic of Korea.

Min-Kyu Park Yeong-Jun Park Myung-Suk Kang Min-Ha Kim Soo-Young Kim

Objectives: The data were collected to obtain the complete genome sequence of Pseudarthrobacter sp. NIBRBAC000502770, isolated from the rhizosphere of Sasamorpha in a heavy metal-contaminated coal mine in Hongcheon, Republic of Korea. The objective was to explore the strain's genetic potential for plant growth promotion and heavy metal resistance, particularly arsenate and copper.

View Article and Find Full Text PDF

Similar Publications

Large donor CRISPR for whole-CDS replacement of cell adhesion molecule LRRTM2.

J Neurosci

January 2025

Department of Physiology, University of Maryland School of Medicine, Baltimore, MD, USA

Stephanie L Pollitt Aaron D Levy Michael C Anderson Thomas A Blanpied

The cell adhesion molecule Leucine-Rich Repeat Transmembrane neuronal protein 2 (LRRTM2) is crucial for synapse development and function. However, our understanding of its endogenous trafficking has been limited due to difficulties in manipulating its coding sequence (CDS) using standard genome editing techniques. Instead, we replaced the entire LRRTM2 CDS by adapting a two-guide CRISPR knock-in method, enabling complete control of LRRTM2.

View Article and Find Full Text PDF

Similar Publications

Genomic insights into a multidrug-resistant Pandoraea apista clinical isolate carrying bla from China.

J Glob Antimicrob Resist

January 2025

Clinical Laboratory Department, Lishui People's Hospital, the Sixth Affiliated Hospital of Wenzhou Medical University, Lishui, China. Electronic address:

Lirong Li Yawen Zhang Fang He Ningjun Wu

Objectives: Pandoraea apista is notable for its multidrug resistance and is frequently identified in patients with cystic fibrosis or other chronic lung diseases, where it contributes to persistent lung infections. In this study, we describe a strain of P. apista harboring the bla, isolated from the bronchoalveolar lavage (BAL) fluid of an inpatient in China.

View Article and Find Full Text PDF

Similar Publications

The genome sequence of the Poplar Grey moth, (Denis & Schiffermüller, 1775).

Wellcome Open Res

November 2024

UK Centre for Ecology & Hydrology, Wallingford, England, UK.

Douglas Boyes

We present a genome assembly from an individual male (Poplar Grey moth; Arthropoda; Insecta; Lepidoptera; Noctuidae). The genome sequence has a total length of 424.20 megabases.

View Article and Find Full Text PDF

Similar Publications

First complete genome sequence of tulip mild mottle mosaic virus (Ophiovirus tulipae).

Arch Virol

January 2025

School of Agriculture, Utsunomiya University, 350 Mine-machi, Utsunomiya, Tochigi, 321-8505, Japan.

Yutaro Neriya Kakeru Hamamoto Tominari Kobayashi Shunsuke Nakase Rena Kurosawa

Tulip mild mottle mosaic disease, caused by tulip mild mottle mosaic virus (TMMMV, species Ophiovirus tulipae), was first reported in Japan in 1979. TMMMV has a negative-sense ssRNA genome and is closely related to ophioviruses such as Mirafiori lettuce big vein virus (MLBVV, Ophiovirus mirafioriense). However, its complete nucleotide sequence has not yet been reported.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!