Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1), which is the first version of the SUPFAM database, has been derived by analysing Pfam, which is one of the commonly used databases of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI, which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697, i.e. 47%) which are related, either by close homologous connection to a SCOP family or by distant relationship to a SCOP family, potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information, an all-against-all comparison involving a sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam into 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying 'priority proteins' for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example, we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. Fifty-one of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC99061 | PMC |
http://dx.doi.org/10.1093/nar/30.1.289 | DOI Listing |
Int J Mol Sci
December 2024
College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China.
The heat shock protein 70 (HSP70) family plays an important role in the growth and development of lettuce and in the defense response to high-temperature stress; however, its bioinformatics analysis in lettuce has been extremely limited. Genome-wide bioinformatics analysis methods such as chromosome location, phylogenetic relationships, gene structure, collinearity analysis, and promoter analysis were performed in the gene family, and the expression patterns in response to high-temperature stress were analyzed. The mechanism of in heat resistance in lettuce was studied by virus-induced gene silencing (VIGS) and transient overexpression techniques.
View Article and Find Full Text PDFBioinformatics
December 2024
Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, United States.
Motivation: Due to the breakthrough in protein structure prediction by AlphaFold, the scientific community has access to 200 million predicted protein structures with near-atomic accuracy from the AlphaFold protein structure DataBase (AFDB), covering nearly the entire protein universe. Segmenting these models into domains and classifying them into an evolutionary hierarchy hold tremendous potential for unraveling essential insights into protein function.
Results: We introduce DPAM-AI, a Domain Parser for AlphaFold Models based on Artificial Intelligence.
Extremophiles
December 2024
School of Life Sciences, University of Nevada Las Vegas, Las Vegas, USA.
Among the many ice-binding proteins (IBPs) found in microorganisms (bacteria, archaea, fungi and algae), the canonical DUF3494 beta-barrel type is the most common. Until now, little variation has been found in this structure: an initial coil leads into an alpha helix that directs the following coils into a reverse stack, with the final coil ending up next to the initial coil. Here, I show that there exist many bacterial proteins whose AlphaFold-predicted structures deviate from the DUF3494 structure so that they are not recognized as belonging to an existing DUF or Pfam family.
View Article and Find Full Text PDFNucleic Acids Res
January 2025
IQVIA Ltd., The Point, 37 North Wharf Road, London W2 1AF, UK.
The 2025 Nucleic Acids Research database issue contains 185 papers spanning biology and related areas. Seventy three new databases are covered, while resources previously described in the issue account for 101 update articles. Databases most recently published elsewhere account for a further 11 papers.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!