A large expert-curated cryo-EM image dataset for machine learning protein particle picking.

Sci Data

Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA.

Published: June 2023

Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of biological macromolecular complexes. Picking single-protein particles from cryo-EM micrographs is a crucial step in reconstructing protein structures. However, the widely used template-based particle picking process is labor-intensive and time-consuming. Though machine learning and artificial intelligence (AI) based particle picking can potentially automate the process, its development is hindered by lack of large, high-quality labelled training data. To address this bottleneck, we present CryoPPP, a large, diverse, expert-curated cryo-EM image dataset for protein particle picking and analysis. It consists of labelled cryo-EM micrographs (images) of 34 representative protein datasets selected from the Electron Microscopy Public Image Archive (EMPIAR). The dataset is 2.6 terabytes and includes 9,893 high-resolution micrographs with labelled protein particle coordinates. The labelling process was rigorously validated through 2D particle class validation and 3D density map validation with the gold standard. The dataset is expected to greatly facilitate the development of both AI and classical methods for automated cryo-EM protein particle picking.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287764PMC
http://dx.doi.org/10.1038/s41597-023-02280-2DOI Listing

Publication Analysis

Top Keywords

particle picking
20
protein particle
16
expert-curated cryo-em
8
cryo-em image
8
image dataset
8
machine learning
8
cryo-em micrographs
8
particle
7
cryo-em
6
protein
6

Similar Publications

Segmentation is a critical data processing step in many applications of cryo-electron tomography. Downstream analyses, such as subtomogram averaging, are often based on segmentation results, and are thus critically dependent on the availability of open-source software for accurate as well as high-throughput tomogram segmentation. There is a need for more user-friendly, flexible, and comprehensive segmentation software that offers an insightful overview of all steps involved in preparing automated segmentations.

View Article and Find Full Text PDF

Automatic single particle picking is a critical step in the data processing pipeline of cryo-electron microscopy structure reconstruction. In recent years, several deep learning-based algorithms have been developed, demonstrating their potential to solve this challenge. However, current methods highly depend on manually labeled training data, which is labor-intensive and prone to biases especially for high-noise and low-contrast micrographs, resulting in suboptimal precision and recall.

View Article and Find Full Text PDF

Immune profile diversity is achieved with synthetic TLR4 agonists combined with the RG1-VLP vaccine in mice.

Vaccine

January 2025

Cancer ImmunoPrevention Laboratory, Frederick National Laboratory for Cancer Research, Frederick, MD, USA. Electronic address:

The TLR4 (Toll-like receptor 4)-activating agonist MPLA (monophosphoryl lipid A) is a key component of the adjuvant systems AS01 and AS04, utilized in marketed preventive vaccines for several infectious pathogens. As MPLA is a biologically-derived product containing a mixture of several lipid A congeners with a 4' phosphoryl group and varying numbers of acyl chains with distinct activities, extensive efforts to refine its production and immunogenicity are ongoing; notably, the development of the BECC (Bacterial Enzymatic Combinatorial Chemistry) system in which bacteria express lipid A-modifying enzymes to produce a panoply of lipid A congeners. In an effort to characterize the adjuvant activity of these lipid A congeners, we compared biologically-derived and synthetic versions of BECC470 and BECC438 for adjuvant activity in BALB/c mice vaccinated with the HPV (Human papilloma virus) VLP-based vaccine, RG1-VLP.

View Article and Find Full Text PDF

Particle picking in cryo-electron tomograms (cryo-ET) is crucial for in situ structure detection of macromolecules and protein complexes. The traditional template-matching-based approaches for particle picking suffer from template-specific biases and have low throughput. Given these problems, learning-based solutions are necessary for particle picking.

View Article and Find Full Text PDF

Cryo-EM particle identification from micrographs ("picking") is challenging due to the low signal-to-noise ratio and lack of ground truth for particle locations. State-of-the-art computational algorithms ("pickers") identify different particle sets, complicating the selection of the best-suited picker for a protein of interest. Here, we present REliable PIcking by Consensus (REPIC), a computational approach to identifying particles common to the output of multiple pickers.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!