Protein crystallization is a major bottleneck in protein X-ray crystallography, the workhorse of most structural proteomics projects. Because the principles that govern protein crystallization are too poorly understood to allow them to be used in a strongly predictive sense, the most common crystallization strategy entails screening a wide variety of solution conditions to identify the small subset that will support crystal nucleation and growth. We tested the hypothesis that more efficient crystallization strategies could be formulated by extracting useful patterns and correlations from the large data sets of crystallization trials created in structural proteomics projects. A database of crystallization conditions was constructed for 755 different proteins purified and crystallized under uniform conditions. Forty-five percent of the proteins formed crystals. Data mining identified the conditions that crystallize the most proteins, revealed that many conditions are highly correlated in their behavior, and showed that the crystallization success rate is markedly dependent on the organism from which proteins derive. Of the proteins that crystallized in a 48-condition experiment, 60% could be crystallized in as few as 6 conditions and 94% in 24 conditions. Consideration of the full range of information coming from crystal screening trials allows one to design screens that are maximally productive while consuming minimal resources, and also suggests further useful conditions for extending existing screens.

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.10340DOI Listing

Publication Analysis

Top Keywords

data mining
8
crystallization
8
protein crystallization
8
structural proteomics
8
proteomics projects
8
conditions
8
proteins
5
mining crystallization
4
crystallization databases
4
databases knowledge-based
4

Similar Publications

This study presents a web application for predicting cardiovascular disease (CVD) and hypertension (HTN) among mine workers using machine learning (ML) techniques. The dataset, collected from 699 participants at the Gol-Gohar mine in Iran between 2016 and 2020, includes demographic, occupational, lifestyle, and medical information. After preprocessing and feature engineering, the Random Forest algorithm was identified as the best-performing model, achieving 99% accuracy for HTN prediction and 97% for CVD, outperforming other algorithms such as Logistic Regression and Support Vector Machines.

View Article and Find Full Text PDF

Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance.

Sci Rep

December 2024

Key Laboratory of Computing Power Network and Information Security, Shandong Computer Science Center (National Supercomputing Center in Jinan), Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250013, Shandong, P. R. China.

Crystal structure similarity is useful for the chemical analysis of nowadays big materials databases and data mining new materials. Here we propose to use two-dimensional Wasserstein distance (earth mover's distance) to measure the compositional similarity between different compounds, based on the periodic table representation of compositions. To demonstrate the effectiveness of our approach, 1586 Cu-S based compounds are taken from the inorganic crystal structure database (ICSD) to form a validation dataset.

View Article and Find Full Text PDF

Given the complexity of the behavior of mining tailings dams built by the technique of hydraulic embankments and the recurring dam ruptures globally, especially in Brazil, ensuring enhanced safety through advanced disposal techniques becomes crucial. While the co-disposal method has been extensively explored for various mineral substances, a notable gap exists in the literature concerning its application specifically to tailings and waste rock generated from phosphate mining operations. This study aims to identify the optimal ratio for a mining tailings and waste rock mixture and evaluate its mechanical behavior in comparison to individual materials.

View Article and Find Full Text PDF

The height of the water-conducting fracture zones (WCFZ) is crucial for ensuring safe coal mining beneath aquifers, particularly considering the secondary development of the WCFZ in upper seams due to repeated mining in close distance coal seams. Accurately predicting this height is essential for mine safety, groundwater protection, and optimal coal resource use. This study compiles extensive measured data from various mining areas in China to analyze the coupling relationship between the WCFZ development height and six influencing factors: mining thickness, mining depth, coal seam spacing, hard rock lithology ratio, and the slope length of working face.

View Article and Find Full Text PDF

The dataset gathers available regulations of human activities and protection levels of Marine Protected Areas (MPAs) of the European Union (EU). The MPA list and polygons were extracted from the MPA database of the European Environment Agency (EEA) and completed with available zoning systems (all were filtered for their marine area reported under the Marine Strategy Framework Directive). Fully-overlapping MPAs were merged.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!