AI Article Synopsis

  • SVD is a powerful method for analyzing multiple sequence alignments (MSAs) that helps identify sequence subgroups and extract important features related to structure and function.
  • SVD can be made more accessible by explaining its mathematics intuitively, as demonstrated through a simplified model that shows how sequence conservation and covariance affect alignment features.
  • The study applies SVD to two protein families, revealing sequence clustering and providing Python scripts for users to conduct their own SVD analyses on MSAs, which are available for free on GitHub.

Article Abstract

Singular value decomposition (SVD) of multiple sequence alignments (MSAs) is an important and rigorous method to identify subgroups of sequences within the MSA, and to extract consensus and covariance sequence features that define the alignment and distinguish the subgroups. This information can be correlated to structure, function, stability, and taxonomy. However, the mathematics of SVD is unfamiliar to many in the field of protein science. Here, we attempt to present an intuitive yet comprehensive description of SVD analysis of MSAs. We begin by describing the underlying mathematics of SVD in a way that is both rigorous and accessible. Next, we use SVD to analyze sequences generated with a simplified model in which the extent of sequence conservation and covariance between different positions is controlled, to show how conservation and covariance produce features in the decomposed coordinate system. We then use SVD to analyze alignments of two protein families, the homeodomain and the Ras superfamilies. Both families show clear evidence of sequence clustering when projected into singular value space. We use k-means clustering to group MSA sequences into specific clusters, show how the residues that distinguish these clusters can be identified, and show how these clusters can be related to taxonomy and function. We end by providing a description a set of Python scripts that can be used for SVD analysis of MSAs, displaying results, and identifying and analyzing sequence clusters. These scripts are freely available on GitHub.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9514065PMC
http://dx.doi.org/10.1002/pro.4422DOI Listing

Publication Analysis

Top Keywords

singular decomposition
8
mathematics svd
8
svd analysis
8
analysis msas
8
svd analyze
8
conservation covariance
8
svd
7
sequence
6
decomposition protein
4
sequences
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!