Summary: We propose RabbitKSSD, a high-speed genome distance estimation tool. Specifically, we leverage load-balanced task partitioning, fast I/O, efficient intermediate result accesses, and high-performance data structures to improve overall efficiency. Our performance evaluation demonstrates that RabbitKSSD achieves speedups ranging from 5.
View Article and Find Full Text PDFAssessing the quality of sequencing data plays a crucial role in downstream data analysis. However, existing tools often achieve sub-optimal efficiency, especially when dealing with compressed files or performing complicated quality control operations such as over-representation analysis and error correction. We present RabbitQCPlus, an ultra-efficient quality control tool for modern multi-core systems.
View Article and Find Full Text PDFWe present RabbitTClust, a fast and memory-efficient genome clustering tool based on sketch-based distance estimation. Our approach enables efficient processing of large-scale datasets by combining dimensionality reduction techniques with streaming and parallelization on modern multi-core platforms. 113,674 complete bacterial genome sequences from RefSeq, 455 GB in FASTA format, can be clustered within less than 6 min and 1,009,738 GenBank assembled bacterial genomes, 4.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
June 2023
The continuous growth of generated sequencing data leads to the development of a variety of associated bioinformatics tools. However, many of them are not able to fully exploit the resources of modern multi-core systems since they are bottlenecked by parsing files leading to slow execution times. This motivates the design of an efficient method for parsing sequencing data that can exploit the power of modern hardware, especially for modern CPUs with fast storage devices.
View Article and Find Full Text PDFMotivation: Detection and identification of viruses and microorganisms in sequencing data plays an important role in pathogen diagnosis and research. However, existing tools for this problem often suffer from high runtimes and memory consumption.
Results: We present RabbitV, a tool for rapid detection of viruses and microorganisms in Illumina sequencing datasets based on fast identification of unique k-mers.
The MnO-TNTA composite electrodes were obtained through depositing MnO into TiO nanotube arrays (TNTA) by successive ionic layer adsorption reaction (SILAR) and subsequent hydrothermal method. The MnO-TNTA nanocomposites were used as electrochemical sensors for the detection of hydrogen peroxide (HO). The preparation conditions of MnO-TNTA electrodes and test conditions affect the electrochemical detection performance significantly.
View Article and Find Full Text PDFMotivation: Mash is a popular hash-based genome analysis toolkit with applications to important downstream analyses tasks such as clustering and assembly. However, Mash is currently not able to fully exploit the capabilities of modern multi-core architectures, which in turn leads to high runtimes for large-scale genomic datasets.
Results: We present RabbitMash, an efficient highly optimized implementation of Mash which can take full advantage of modern hardware including multi-threading, vectorization and fast I/O.
Motivation: Modern sequencing technologies continue to revolutionize many areas of biology and medicine. Since the generated datasets are error-prone, downstream applications usually require quality control methods to pre-process FASTQ files. However, existing tools for this task are currently not able to fully exploit the capabilities of computing platforms leading to slow runtimes.
View Article and Find Full Text PDFComput Struct Biotechnol J
August 2017
The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data.
View Article and Find Full Text PDF