Precision requirements and data compression in CryoEM/CryoET.

J Struct Biol

Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, United States. Electronic address:

Published: September 2022

With larger, higher speed detectors and improved automation, individual CryoEM instruments are capable of producing a prodigious amount of data each day, which must then be stored, processed and archived. While it has become routine to use lossless compression on raw counting-mode movies, the averages which result after correcting these movies no longer compress well. These averages could be considered sufficient for long term archival, yet they are conventionally stored with 32 bits of precision, despite high noise levels. Derived images are similarly stored with excess precision, providing an opportunity to decrease project sizes and improve processing speed. We present a simple argument based on propagation of uncertainty for safe bit truncation of flat-fielded images combined with lossless compression. The same method can be used for most derived images throughout the processing pipeline. We test the proposed strategy on two standard, data-limited CryoEM data sets, demonstrating that these limits are safe for real-world use. We find that 5 bits of precision is sufficient for virtually any raw CryoEM data and that 8-12 bits is sufficient for intermediate averages or final 3-D structures. Additionally, we detail and recommend specific rules for discretization of data as well as a practical compressed data representation that is tuned to the specific needs of CryoEM.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9645247PMC
http://dx.doi.org/10.1016/j.jsb.2022.107875DOI Listing

Publication Analysis

Top Keywords

lossless compression
8
bits precision
8
derived images
8
cryoem data
8
data
6
precision
4
precision requirements
4
requirements data
4
data compression
4
compression cryoem/cryoet
4

Similar Publications

Deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sequence compressors for novel species frequently face challenges when processing wide-scale raw, FASTA, or multi-FASTA structured data. For years, molecular sequence databases have favored the widely used general-purpose Gzip and Zstd compressors. The absence of sequence-specific characteristics in these encoders results in subpar performance, and their use depends on time-consuming parameter adjustments.

View Article and Find Full Text PDF

A DNA Data Storage Method Using Spatial Encoding Based Lossless Compression.

Entropy (Basel)

December 2024

Computer Engineering Department, Düzce University, 81620 Düzce, Turkey.

With the rapid increase in global data and rapid development of information technology, DNA sequences have been collected and manipulated on computers. This has yielded a new and attractive field of bioinformatics, DNA storage, where DNA has been considered as a great potential storage medium. It is known that one gram of DNA can store 215 GB of data, and the data stored in the DNA can be preserved for tens of thousands of years.

View Article and Find Full Text PDF

Lossless Image Compression Using Context-Dependent Linear Prediction Based on Mean Absolute Error Minimization.

Entropy (Basel)

December 2024

Faculty of Computer Science and Information Technology, West Pomeranian University of Technology in Szczecin, ul. Żołnierska 49, 71-210 Szczecin, Poland.

This paper presents a method for lossless compression of images with fast decoding time and the option to select encoder parameters for individual image characteristics to increase compression efficiency. The data modeling stage was based on linear and nonlinear prediction, which was complemented by a simple block for removing the context-dependent constant component. The prediction was based on the Iterative Reweighted Least Squares () method which allowed the minimization of mean absolute error.

View Article and Find Full Text PDF

State-of-the-Art Trends in Data Compression: COMPROMISE Case Study.

Entropy (Basel)

November 2024

Faculty of Electrical Engineering and Computer Science, University of Maribor, Koroška cesta 46, SI-2000 Maribor, Slovenia.

After a boom that coincided with the advent of the internet, digital cameras, digital video and audio storage and playback devices, the research on data compression has rested on its laurels for a quarter of a century. Domain-dependent lossy algorithms of the time, such as JPEG, AVC, MP3 and others, achieved remarkable compression ratios and encoding and decoding speeds with acceptable data quality, which has kept them in common use to this day. However, recent computing paradigms such as cloud computing, edge computing, the Internet of Things (IoT), and digital preservation have gradually posed new challenges, and, as a consequence, development trends in data compression are focusing on concepts that were not previously in the spotlight.

View Article and Find Full Text PDF

Lossless and reference-free compression of FASTQ/A files using GeneSqueeze.

Sci Rep

January 2025

Rajant Health Incorporated, 200 Chesterfield Parkway, Malvern, PA, 19355PA, USA.

As sequencing becomes more accessible, there is an acute need for novel compression methods to efficiently store sequencing files. Omics analytics can leverage sequencing technologies to enhance biomedical research and individualize patient care, but sequencing files demand immense storage capabilities, particularly when sequencing is utilized for longitudinal studies. Addressing the storage challenges posed by these technologies is crucial for omics analytics to achieve their full potential.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!