Background: Next-generation sequencing (NGS) methods pose computational challenges of handling large volumes of data. Although cloud computing offers a potential solution to these challenges, transferring a large data set across the internet is the biggest obstacle, which may be overcome by efficient encoding methods. When encoding is used to facilitate data transfer to the cloud, the time factor is equally as important as the encoding efficiency. Moreover, to take advantage of parallel processing in cloud computing, a parallel technique to decode and split compressed data in the cloud is essential. Hence in this review, we present SOLiDzipper, a new encoding method for NGS data.
Methods: The basic strategy of SOLiDzipper is to divide and encode. NGS data files contain both the sequence and non-sequence information whose encoding efficiencies are different. In SOLiDzipper, encoded data are stored in binary data block that does not contain the characteristic information of a specific sequence platform, which means that data can be decoded according to a desired platform even in cases of Illumina, Solexa or Roche 454 data.
Results: The main calculation time using Crossbow was 173 minutes when 40 EC2 nodes were involved. In that case, an analysis preparation time of 464 minutes is required to encode data in the latest DNA compression method like G-SQZ and transmit it on a 183 Mbit/s bandwidth. However, it takes 194 minutes to encode and transmit data with SOLiDzipper under the same bandwidth conditions. These results indicate that the entire processing time can be reduced according to the encoding methods used, under the same network bandwidth conditions. Considering the limited network bandwidth, high-speed, high-efficiency encoding methods such as SOLiDzipper can make a significant contribution to higher productivity in labs seeking to take advantage of the cloud as an alternative to local computing.
Availability: http://szipper.dinfree.com. Academic/non-profit: Binary available for direct download at no cost. For-profit: Submit request for for-profit license from the web-site.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3072624 | PMC |
http://dx.doi.org/10.4137/EBO.S6618 | DOI Listing |
Brief Bioinform
November 2024
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong, 999077, China.
The complexity of T cell receptor (TCR) sequences, particularly within the complementarity-determining region 3 (CDR3), requires efficient embedding methods for applying machine learning to immunology. While various TCR CDR3 embedding strategies have been proposed, the absence of their systematic evaluations created perplexity in the community. Here, we extracted CDR3 embedding models from 19 existing methods and benchmarked these models with four curated datasets by accessing their impact on the performance of TCR downstream tasks, including TCR-epitope binding affinity prediction, epitope-specific TCR identification, TCR clustering, and visualization analysis.
View Article and Find Full Text PDFJ Med Internet Res
January 2025
Department of Industrial and Systems Engineering, The University of Florida, GAINESVILLE, FL, United States.
Background: The implementation of large language models (LLMs), such as BART (Bidirectional and Auto-Regressive Transformers) and GPT-4, has revolutionized the extraction of insights from unstructured text. These advancements have expanded into health care, allowing analysis of social media for public health insights. However, the detection of drug discontinuation events (DDEs) remains underexplored.
View Article and Find Full Text PDFMol Biol Rep
January 2025
Department of Cellular Pathology, Institute for Developmental Research, Aichi Developmental Disability Center, 713-8 Kamiya, Kasugai, 486-0392, Japan.
Background: RAB11 is a small GTP-binding protein that regulates intracellular trafficking of recycling endosomes and is thereby involved in several neural functions. Highly similar RAB11 isoforms are encoded by RAB11A and RAB11B genes, and their pathogenic variants are associated with similar neurodevelopmental disorders, suggesting that RAB11A and RAB11B play similar and important roles in brain development. However, the detailed distribution patterns of these isoforms in various organs, including the brain, remain undetermined.
View Article and Find Full Text PDFGenetics
January 2025
Donald Danforth Plant Science Center, St. Louis, MO 63132, USA.
Forward genetic screens of mutant populations are fundamental for functional genomics studies. However, isolating independent mutant alleles to molecularly identify causal genes is challenging in species recalcitrant to genetic manipulation. Here, we demonstrate that classic seed EMS mutagenesis coupled with genome sequencing can overcome this limitation in sorghum.
View Article and Find Full Text PDFJ Virol
January 2025
Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, Canada.
Unlabelled: Coronaviruses have large, positive-sense single-stranded RNA genomes that challenge conventional strategies for mutagenesis. Yeast genetics has been used to manipulate large viral genomes, including those of herpesviruses and coronaviruses. This method, known as transformation-associated recombination (TAR), involves assembling complete viral genomes from dsDNA copies of viral genome fragments via homologous recombination in .
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!