Publications by Zekun Yin

Publications by authors named "Zekun Yin"

Page 1 of 1

RabbitKSSD: accelerating genome distance estimation on modern multi-core architectures.

Xiaoming Xu Zekun Yin Lifeng Yan Huiguang Yi Hua Wang

Bioinformatics

November 2023

Summary: We propose RabbitKSSD, a high-speed genome distance estimation tool. Specifically, we leverage load-balanced task partitioning, fast I/O, efficient intermediate result accesses, and high-performance data structures to improve overall efficiency. Our performance evaluation demonstrates that RabbitKSSD achieves speedups ranging from 5.

View Article and Find Full Text PDF

RabbitQCPlus 2.0: More efficient and versatile quality control for sequencing data.

Lifeng Yan Zekun Yin Hao Zhang Zhan Zhao Mingkai Wang

Methods

August 2023

Assessing the quality of sequencing data plays a crucial role in downstream data analysis. However, existing tools often achieve sub-optimal efficiency, especially when dealing with compressed files or performing complicated quality control operations such as over-representation analysis and error correction. We present RabbitQCPlus, an ultra-efficient quality control tool for modern multi-core systems.

View Article and Find Full Text PDF

RabbitTClust: enabling fast clustering analysis of millions of bacteria genomes with MinHash sketches.

Xiaoming Xu Zekun Yin Lifeng Yan Hao Zhang Borui Xu

Genome Biol

May 2023

We present RabbitTClust, a fast and memory-efficient genome clustering tool based on sketch-based distance estimation. Our approach enables efficient processing of large-scale datasets by combining dimensionality reduction techniques with streaming and parallelization on modern multi-core platforms. 113,674 complete bacterial genome sequences from RefSeq, 455 GB in FASTA format, can be clustered within less than 6 min and 1,009,738 GenBank assembled bacterial genomes, 4.

View Article and Find Full Text PDF

RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms.

Hao Zhang Honglei Song Xiaoming Xu Qixin Chang Mingkai Wang Zekun Yin

IEEE/ACM Trans Comput Biol Bioinform

June 2023

The continuous growth of generated sequencing data leads to the development of a variety of associated bioinformatics tools. However, many of them are not able to fully exploit the resources of modern multi-core systems since they are bottlenecked by parsing files leading to slow execution times. This motivates the design of an efficient method for parsing sequencing data that can exploit the power of modern hardware, especially for modern CPUs with fast storage devices.

View Article and Find Full Text PDF

RabbitV: fast detection of viruses and microorganisms in sequencing data on multi-core architectures.

Hao Zhang Qixin Chang Zekun Yin Xiaoming Xu Yanjie Wei

Bioinformatics

May 2022

Motivation: Detection and identification of viruses and microorganisms in sequencing data plays an important role in pathogen diagnosis and research. However, existing tools for this problem often suffer from high runtimes and memory consumption.

Results: We present RabbitV, a tool for rapid detection of viruses and microorganisms in Illumina sequencing datasets based on fast identification of unique k-mers.

View Article and Find Full Text PDF

Facile preparation of MnO-TiO nanotube arrays composite electrode for electrochemical detection of hydrogen peroxide.

Mengyao Yang Zhigang Wu Xixin Wang Zekun Yin Xu Tan

Talanta

July 2022

The MnO-TNTA composite electrodes were obtained through depositing MnO into TiO nanotube arrays (TNTA) by successive ionic layer adsorption reaction (SILAR) and subsequent hydrothermal method. The MnO-TNTA nanocomposites were used as electrochemical sensors for the detection of hydrogen peroxide (HO). The preparation conditions of MnO-TNTA electrodes and test conditions affect the electrochemical detection performance significantly.

View Article and Find Full Text PDF

RabbitMash: accelerating hash-based genome analysis on modern multi-core architectures.

Zekun Yin Xiaoming Xu Jinxiao Zhang Yanjie Wei Bertil Schmidt

Bioinformatics

May 2021

Motivation: Mash is a popular hash-based genome analysis toolkit with applications to important downstream analyses tasks such as clustering and assembly. However, Mash is currently not able to fully exploit the capabilities of modern multi-core architectures, which in turn leads to high runtimes for large-scale genomic datasets.

Results: We present RabbitMash, an efficient highly optimized implementation of Mash which can take full advantage of modern hardware including multi-threading, vectorization and fast I/O.

View Article and Find Full Text PDF

RabbitQC: high-speed scalable quality control for sequencing data.

Zekun Yin Hao Zhang Meiyang Liu Wen Zhang Honglei Song

Bioinformatics

May 2021

Motivation: Modern sequencing technologies continue to revolutionize many areas of biology and medicine. Since the generated datasets are error-prone, downstream applications usually require quality control methods to pre-process FASTQ files. However, existing tools for this task are currently not able to fully exploit the capabilities of computing platforms leading to slow runtimes.

View Article and Find Full Text PDF

Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.

Zekun Yin Haidong Lan Guangming Tan Mian Lu Athanasios V Vasilakos

Comput Struct Biotechnol J

August 2017

The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data.

View Article and Find Full Text PDF

Publications by authors named "Zekun Yin"

RabbitKSSD: accelerating genome distance estimation on modern multi-core architectures.

RabbitQCPlus 2.0: More efficient and versatile quality control for sequencing data.

RabbitTClust: enabling fast clustering analysis of millions of bacteria genomes with MinHash sketches.

RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms.

RabbitV: fast detection of viruses and microorganisms in sequencing data on multi-core architectures.

Facile preparation of MnO-TiO nanotube arrays composite electrode for electrochemical detection of hydrogen peroxide.

RabbitMash: accelerating hash-based genome analysis on modern multi-core architectures.

RabbitQC: high-speed scalable quality control for sequencing data.

Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.

A PHP Error was encountered

A PHP Error was encountered