Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences.
Results: We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses.
Availability: PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3381962 | PMC |
http://dx.doi.org/10.1093/bioinformatics/bts198 | DOI Listing |
BMC Res Notes
January 2025
Department of Computer Engineering, Chungbuk National University, Chungdae-ro 1, Cheongju, 28644, Republic of Korea.
Background: Drug response prediction can infer the relationship between an individual's genetic profile and a drug, which can be used to determine the choice of treatment for an individual patient. Prediction of drug response is recently being performed using machine learning technology. However, high-throughput sequencing data produces thousands of features per patient.
View Article and Find Full Text PDFMol Cancer
January 2025
Department of Medicine, Section of Epidemiology and Population Sciences, Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA.
Lipid nanoparticles (LNPs) for mRNA delivery have advanced significantly, but LNP-mediated DNA delivery still faces clinical challenges. This study compared various LNP formulations for delivering DNA-encoded biologics, assessing their expression efficacy and the protective immunity generated by LNP-encapsulated DNA in different models. The LNP formulation used in Moderna's Spikevax mRNA vaccine (LNP-M) demonstrated a stable nanoparticle structure, high expression efficiency, and low toxicity.
View Article and Find Full Text PDFBMC Biol
January 2025
The Key Laboratory of Biotechnology for Medicinal Plant of Jiangsu Province, School of Life Science, Jiangsu Normal University, Xuzhou, Jiangsu, 221116, China.
Background: The variations in alliin content are a crucial criterion for evaluating garlic quality and is the sole precursor for allicin biosynthesis, which is significant for the growth, development, and stress response of garlic. WRKY transcription factors are essential for enhancing stress resistance by regulating the synthesis of plant secondary metabolites. However, the molecular mechanisms regulating alliin biosynthesis remain unexplored.
View Article and Find Full Text PDFVirol J
January 2025
Laboratory of Clinical Virology, WHO Regional Reference Laboratory for Poliomyelitis and Measles for in the Eastern Mediterranean Region, Institut Pasteur de Tunis, University of Tunis El Manar, 13 place Pasteur, BP74 1002 le Belvédère, Tunis, Tunisia.
Background: Primary Immunodeficiency disorders (PID) can increase the risk of severe COVID-19 and prolonged infection. This study investigates the duration of SARS-CoV-2 excretion and the genetic evolution of the virus in pediatric PID patients as compared to immunocompetent (IC) patients.
Materials And Methods: A total of 40 nasopharyngeal and 24 stool samples were obtained from five PID and ten IC children.
J Transl Med
January 2025
Department of Stem Cell and Regenerative Medicine, Southwest Cancer Center, Southwest Hospital, Third Military Medical University (Army Medical University), Chongqing, 400038, China.
Background: It is worthwhile to establish a prognostic prediction model based on microenvironment cells (MCs) infiltration and explore new treatment strategies for triple-negative breast cancer (TNBC).
Methods: The xCell algorithm was used to quantify the cellular components of the TNBC microenvironment based on bulk RNA sequencing (bulk RNA-seq) data. The MCs index (MCI) was constructed using the least absolute shrinkage and selection operator Cox (LASSO-Cox) regression analysis.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!