Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6233922 | PMC |
http://dx.doi.org/10.1371/journal.pcbi.1006547 | DOI Listing |
CRISPR J
January 2025
Department of Microbiology and Cell Biology, Montana State University, Bozeman, Montana, USA.
Bacteria and archaea acquire resistance to genetic parasites by preferentially integrating short fragments of foreign DNA at one end of a Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR). "Leader" DNA upstream of CRISPR loci regulates transcription and foreign DNA integration into the CRISPR. Here, we analyze 37,477 CRISPRs from 39,277 bacterial and 556 archaeal genomes to identify conserved sequence motifs in CRISPR leaders.
View Article and Find Full Text PDFCurr Microbiol
January 2025
Coastar Therapeutics, San Diego, CA, 92126, USA.
Staphylococcus epidermidis (S. epidermidis) live in different human locations and natural environments. For ribotyping S.
View Article and Find Full Text PDFBiochem J
January 2025
University of Pittsburgh School of Medicine, Pittsburgh, United States.
The sodium phosphate cotransporter-2A (NPT2A) mediates basal and parathyroid hormone (PTH)- and fibroblast growth factor-23 (FGF23)-regulated phosphate transport in proximal tubule cells of the kidney. Both basal and hormone-sensitive transport require sodium hydrogen exchanger regulatory factor-1 (NHERF1), a scaffold protein with tandem PDZ domains, PDZ1 and PDZ2. NPT2A binds to PDZ1.
View Article and Find Full Text PDFHLA
January 2025
Immunology Unit, Clinical Analysis Department, Albacete University Hospital Complex, Albacete, Spain.
HLA-DRB1*08:130 shows a Leucine at position 64 not described previously.
View Article and Find Full Text PDFBrief Bioinform
November 2024
Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518124, China.
Identifying the regulatory effects of noncoding variants presents a significant challenge. Recently, the accumulation of epigenomic profiling data in wheat has provided an opportunity to model the functional impacts of these variants. In this study, we introduce Language of Genome for Wheat (LOGOWheat), a deep learning-based tool designed to predict the regulatory effects of noncoding variants in wheat.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!