Structural genomics projects represent major undertakings that will change our understanding of proteins. They generate unique datasets that, for the first time, present a standardized view of proteins in terms of their physical and chemical properties. By analyzing these datasets here, we are able to discover correlations between a protein's characteristics and its progress through each stage of the structural genomics pipeline, from cloning, expression, purification, and ultimately to structural determination. First, we use tree-based analyses (decision trees and random forest algorithms) to discover the most significant protein features that influence a protein's amenability to high-throughput experimentation. Based on this, we identify potential bottlenecks in various stages of the structural genomics process through specialized "pipeline schematics". We find that the properties of a protein that are most significant are: (i.) whether it is conserved across many organisms; (ii). the percentage composition of charged residues; (iii). the occurrence of hydrophobic patches; (iv). the number of binding partners it has; and (v). its length. Conversely, a number of other properties that might have been thought to be important, such as nuclear localization signals, are not significant. Thus, using our tree-based analyses, we are able to identify combinations of features that best differentiate the small group of proteins for which a structure has been determined from all the currently selected targets. This information may prove useful in optimizing high-throughput experimentation. Further information is available from http://mining.nesg.org/.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jmb.2003.11.053DOI Listing

Publication Analysis

Top Keywords

structural genomics
16
genomics pipeline
8
tree-based analyses
8
high-throughput experimentation
8
mining structural
4
genomics
4
pipeline identification
4
identification protein
4
properties
4
protein properties
4

Similar Publications

We report a case of Acanthamoeba infection in an HCT recipient with steroid-refractory GVHD. We highlight the multiple challenges that free-living ameba infections present to the clinician, the clinical laboratory, transplant infectious disease for review, hospital epidemiology if nosocomial transmission is considered, and public health officials, as exposure source identification can be a significant challenge. Transplant physicians should include Acanthamoeba infections in their differential diagnosis of a patient with skin, sinus, lung, and/or brain involvement.

View Article and Find Full Text PDF

Root nodule symbiosis is traditionally recognized in the Fabales, Fagales, Cucurbitales, and Rosales orders within the Rosid I clade of angiosperms. However, ambiguous root nodule formation has been reported in Zygophyllaceae and Roystonea regia (Arecaceae), although a detailed analysis has yet to be conducted. We aimed to perform morphological analyses of root structures in these plants and utilize metagenomic techniques to identify and characterize the bacterial populations within the nodule-like structures.

View Article and Find Full Text PDF

Omics-driven onboarding of the carotenoid producing red yeast Xanthophyllomyces dendrorhous CBS 6938.

Appl Microbiol Biotechnol

December 2024

Life Sciences and Bioengineering Center, Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, MA, USA.

Transcriptomics is a powerful approach for functional genomics and systems biology, yet it can also be used for genetic part discovery. Here, we derive constitutive and light-regulated promoters directly from transcriptomics data of the basidiomycete red yeast Xanthophyllomyces dendrorhous CBS 6938 (anamorph Phaffia rhodozyma) and use these promoters with other genetic elements to create a modular synthetic biology parts collection for this organism. X.

View Article and Find Full Text PDF

Family Genetic Risk Communication and Reverse Cascade Testing in the BabySeq Project.

Genet Med

December 2024

Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA; The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA; Harvard Medical School, Boston, MA.

Purpose: Genomic sequencing of newborns (NBSeq) can initiate disease surveillance and therapy for children, and may identify at-risk relatives through reverse cascade testing. We explored genetic risk communication and reverse cascade testing among families of newborns who underwent exome sequencing and had a risk for autosomal dominant disease identified.

Methods: We conducted semi-structured interviews with parents of newborns enrolled in the BabySeq Project who had a pathogenic or likely-pathogenic (P/LP) variant associated with an autosomal dominant (AD) childhood- and/or adult-onset disease returned.

View Article and Find Full Text PDF

Background: Substance use disorders are multifaceted conditions influenced by both genetic and environmental factors. Serotonergic pathways are known to be involved in substance use disorder susceptibility, with genetic markers within serotonin receptor genes identified as potential risk factors.

Methods: To further explore this relationship, we conducted a study to investigate the association between several polymorphisms in five serotonin receptor genes (, , ) and substance use disorders (SUD) in Jordanian males by sequencing genotypes in 496 SUD patients and 496 healthy controls.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!