Background: Identifying functions for all gene products in all sequenced organisms is a central challenge of the post-genomic era. However, at least 30-50% of the proteins encoded by any given genome are of unknown or vaguely known function, and a large number are wrongly annotated. Many of these 'unknown' proteins are common to prokaryotes and plants. We set out to predict and experimentally test the functions of such proteins. Our approach to functional prediction integrates comparative genomics based mainly on microbial genomes with functional genomic data from model microorganisms and post-genomic data from plants. This approach bridges the gap between automated homology-based annotations and the classical gene discovery efforts of experimentalists, and is more powerful than purely computational approaches to identifying gene-function associations.
Results: Among Arabidopsis genes, we focused on those (2,325 in total) that (i) are unique or belong to families with no more than three members, (ii) occur in prokaryotes, and (iii) have unknown or poorly known functions. Computer-assisted selection of promising targets for deeper analysis was based on homology-independent characteristics associated in the SEED database with the prokaryotic members of each family. In-depth comparative genomic analysis was performed for 360 top candidate families. From this pool, 78 families were connected to general areas of metabolism and, of these families, specific functional predictions were made for 41. Twenty-one predicted functions have been experimentally tested or are currently under investigation by our group in at least one prokaryotic organism (nine of them have been validated, four invalidated, and eight are in progress). Ten additional predictions have been independently validated by other groups. Discovering the function of very widespread but hitherto enigmatic proteins such as the YrdC or YgfZ families illustrates the power of our approach.
Conclusions: Our approach correctly predicted functions for 19 uncharacterized protein families from plants and prokaryotes; none of these functions had previously been correctly predicted by computational methods. The resulting annotations could be propagated with confidence to over six thousand homologous proteins encoded in over 900 bacterial, archaeal, and eukaryotic genomes currently available in public databases.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3223725 | PMC |
http://dx.doi.org/10.1186/1471-2164-12-S1-S2 | DOI Listing |
Plant Genome
March 2025
Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Joint International Research Laboratory of Agriculture and Agri-Product Safety of Ministry of Education of China, Yangzhou University, Yangzhou, China.
Winter barley (Hordeum vulgare) production areas in the middle and lower reaches of the Yangtze River are severely threatened by barley yellow mosaic disease, which is caused by Barley yellow mosaic virus and Barley mild mosaic virus. Improving barley disease resistance in breeding programs requires knowledge of genetic loci in germplasm resources. In this study, bulked segregant analysis (BSA) identified a novel major quantitative trait loci (QTL) QRym.
View Article and Find Full Text PDFChanges in the copy number of large genomic regions, termed copy number variations (CNVs), contribute to important phenotypes in many organisms. CNVs are readily identified using conventional approaches when present in a large fraction of the cell population. However, CNVs that are present in only a few genomes across a population are often overlooked but important; if beneficial under specific conditions, a de novo CNV that arises in a single genome can expand during selection to create a larger population of cells with novel characteristics.
View Article and Find Full Text PDFThe distribution of fitness effects (DFE) characterizes the range of selection coefficients from which new mutations are sampled, and thus holds a fundamentally important role in evolutionary genomics. To date, DFE inference in primates has been largely restricted to haplorrhines, with limited data availability leaving the other suborder of primates, strepsirrhines, largely under-explored. To advance our understanding of the population genetics of this important taxonomic group, we here map exonic divergence in aye-ayes ( ) - the only extant member of the Daubentoniidae family of the Strepsirrhini suborder.
View Article and Find Full Text PDFProkaryote evolution is driven in large part by the incessant arms race with viruses. Genomic investments in antivirus defense can be coarsely classified into two categories, immune systems that abrogate virus reproduction resulting in clearance, and altruistic programmed cell death (PCD) systems. Prokaryotic defense systems are enormously diverse, as revealed by an avalanche of recent discoveries, but the basic ecological determinants of defense strategy remain poorly understood.
View Article and Find Full Text PDFEcol Evol
January 2025
United States Fish and Wildlife Service, Texas Fish and Wildlife Conservation Office San Marcos Texas USA.
Karst ecosystems often contain extraordinary biodiversity, but the complex underground aquifers of karst regions present challenges for assessing and conserving stygobiont diversity and investigating their evolutionary history. We examined the karst-obligate salamanders of the species complex in the Edwards Plateau region of central Texas using population genomics data to address questions about population connectivity and the potential for gene exchange within the underlying aquifer system. The species complex has historically been divided into three nominal species, but their status, and spatial extent of species ranges, have remained uncertain.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!