Background: Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses.
Results: Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies.
Conclusions: Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3698007 | PMC |
http://dx.doi.org/10.1186/1471-2164-14-425 | DOI Listing |
Front Immunol
January 2025
Cancer Discovery Hub, National Cancer Centre Singapore, Singapore, Singapore.
Introduction: Recent epidemiological data suggests a rising incidence of breast angiosarcoma (AS-B) in the Western population, with over two-thirds related to irradiation or chronic lymphedema. However, unlike head and neck angiosarcoma (AS-HN), AS-B disease characteristics in Asia remain unclear.
Methods: We examined clinical patterns of angiosarcoma patients (n = 176) seen in an Asiantertiary cancer center from 1999 to 2021, and specifically investigated the molecular and immune features of AS-B in comparison to AS-HN.
Genome Med
January 2025
Department of Epidemiology of Microbial Disease, Yale School of Public Health, 60 College Street, New Haven, CT, USA.
Background: Mixed infection with multiple strains of the same pathogen in a single host can present clinical and analytical challenges. Whole genome sequence (WGS) data can identify signals of multiple strains in samples, though the precision of previous methods can be improved. Here, we present MixInfect2, a new tool to accurately detect mixed samples from Mycobacterium tuberculosis short-read WGS data.
View Article and Find Full Text PDFBMC Genomics
January 2025
State Key Laboratory of Animal Biotech Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China.
Background: Sheep horns play a critical role in the survival and reproduction of sheep. Research on sheep horns not only aids in comprehending their biological roles but is also vital for developing hornless breeds. Although previous studies have suggested that KLK7 may be associated with keratin growth, there are few studies that have focused on the role of KLK7 in sheep horns.
View Article and Find Full Text PDFPLoS One
January 2025
Departamento de Bioquímica y Medicina Molecular, Universidad Autónoma de Nuevo León, Monterrey, Nuevo León, México.
Introduction: The methicillin-resistant Staphylococcus aureus (MRSA) genome varies by geographical location. This study aims to determine the genomic characteristics of MRSA using whole-genome sequencing (WGS) data from medical centers in Mexico and to explore the associations between antimicrobial resistance genes and virulence factors.
Methods: This study included 27 clinical isolates collected from sterile sites at eight centers in Mexico in 2022 and 2023.
Microb Genom
January 2025
Department of Laboratory Medicine, Clinical Microbiology, Faculty of Medicine and Health, rebro University, rebro, Sweden.
National epidemiological investigations of microbial infections greatly benefit from the increased information gained by whole-genome sequencing (WGS) in combination with standardized approaches for data sharing and analysis. To evaluate the quality and accuracy of WGS data generated by different laboratories but analysed by joint pipelines to reach a national surveillance approach. A national methicillin-resistant (MRSA) collection of 20 strains was distributed to nine participating laboratories that performed in-house procedures for WGS.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!