In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/tcbb.2013.122 | DOI Listing |
Undersea Hyperb Med
January 2025
King Hamad American Mission Hospital, A'ali, Kingdom of Bahrain.
Middle ear barotrauma (MEBT) is the most common complication in providing hyperbaric oxygen therapy (HBO). This study explored the impact of altering the shape of the time-pressure curve with the aim of reducing the occurrence of MEBT and optimizing the HBO experience during the pressurization process. Four distinct mathematically derived protocols-Constant Pressure Difference (CPD), Constant Volume Difference (CVD), Constant Ratio (CR), and Inverted Constant Ratio (ICR)-were investigated using computer simulations on a simple ear model.
View Article and Find Full Text PDFJ Phys Chem B
January 2025
Centre of Molecular and Macromolecular Studies, Polish Academy of Sciences, Sienkiewicza 112, Lodz 90-363, Poland.
This work is focused on the impact of temperature and deformation on the mechanical properties, specifically the elastic modulus () of the amorphous regions in semicrystalline polymers, using polypropylene as a case study. It has been shown that increasing temperature results in an decrease due to the enhanced mobility of polymer chains, triggered by the activation of α relaxation processes within the crystalline component. Consequently, rising temperature reduces the "stiffening" effect of the crystalline regions on the interlamellar layers.
View Article and Find Full Text PDFWe propose and demonstrate a photonic compressive sensing (PCS) scheme for microwave signals using optical pulse random mixing, significantly enhancing both the compression ratio and operating frequency range. Unlike continuous-wave laser-based PCS systems, our approach mitigates the non-ideal characteristics of the pseudo-random binary sequence (PRBS), such as sloped edges and amplitude jitters, resulting in a more ideal compression process. Additionally, the high harmonic components of the optical pulses further facilitate wideband downconversion, improving the system's operating frequency range.
View Article and Find Full Text PDFSci Rep
January 2025
College of Safety Engineering, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, China.
The synergistic utilization of multiple solid waste is an effective means of achieving green filling and resource utilization of solid waste in mines. In this paper, the synergistic effects of solid waste granulated blast furnace slag (GS) and carbide slag (CS) as cementitious materials (GCCM) are investigated, along with their preliminary feasibility in combination with coal gangue (CG) and furnace bottom slag (FBS) for the preparation of backfill materials. The synergistic hydration mechanism, mechanical properties, working performance of GCCM and GBC were studied, and the environmental impact and cost-effectiveness of GBC were evaluated.
View Article and Find Full Text PDFRadiol Artif Intell
January 2025
From the Department of Radiology, University Hospital, LMU Munich, Marchioninistr 15,81377 Munich, Germany (T.W., J.D., M.I.); Department of Statistics, LMU Munich, Munich, Germany (T.W., D.R.); and Munich Center for Machine Learning, Munich, Germany (T.W., J.D., D.R., M.I.).
Purpose To investigate whether the computational effort of 3D CT-based multiorgan segmentation with TotalSegmentator can be reduced via Tucker decomposition-based network compression. Materials and Methods In this retrospective study, Tucker decomposition was applied to the convolutional kernels of the TotalSegmentator model, an nnU-Net model trained on a comprehensive CT dataset for automatic segmentation of 117 anatomic structures. The proposed approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!