The identification of rare haplotypes may greatly expand our knowledge in the genetic architecture of both complex and monogenic traits. To this aim, we developed PERHAPS (Paired-End short Reads-based HAPlotyping from next-generation Sequencing data), a new and simple approach to directly call haplotypes from short-read, paired-end Next Generation Sequencing (NGS) data. To benchmark this method, we considered the APOE classic polymorphism (*1/*2/*3/*4), since it represents one of the best examples of functional polymorphism arising from the haplotype combination of two Single Nucleotide Polymorphisms (SNPs). We leveraged the big Whole Exome Sequencing (WES) and SNP-array data obtained from the multi-ethnic UK BioBank (UKBB, N=48,855). By applying PERHAPS, based on piecing together the paired-end reads according to their FASTQ-labels, we extracted the haplotype data, along with their frequencies and the individual diplotype. Concordance rates between WES directly called diplotypes and the ones generated through statistical pre-phasing and imputation of SNP-array data are extremely high (>99%), either when stratifying the sample by SNP-array genotyping batch or self-reported ethnic group. Hardy-Weinberg Equilibrium tests and the comparison of obtained haplotype frequencies with the ones available from the 1000 Genome Project further supported the reliability of PERHAPS. Notably, we were able to determine the existence of the rare APOE*1 haplotype in two unrelated African subjects from UKBB, supporting its presence at appreciable frequency (approximatively 0.5%) in the African Yoruba population. Despite acknowledging some technical shortcomings, PERHAPS represents a novel and simple approach that will partly overcome the limitations in direct haplotype calling from short read-based sequencing.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bib/bbaa320 | DOI Listing |
Front Parasitol
April 2024
Centre for Malaria Elimination, Institute of Tropical Medicine, Mount Kenya University, Thika, Kenya.
The Circumsporozoite Protein (PfCSP) has been used in developing the RTS,S, and R21 malaria vaccines. However, genetic polymorphisms within compromise the effectiveness of the vaccine. Thus, it is essential to continuously assess the genetic diversity of , especially when deploying it across different geographical regions.
View Article and Find Full Text PDFViruses
November 2024
Institute of Biology, ELTE Eötvös Loránd University, 1117 Budapest, Hungary.
The increasingly widespread application of next-generation sequencing (NGS) in clinical diagnostics and epidemiological research has generated a demand for robust, fast, automated, and user-friendly bioinformatics workflows. To guide the choice of tools for the assembly of full-length viral genomes from NGS datasets, we assessed the performance and applicability of four open-source bioinformatics pipelines (shiver-for which we created a user-friendly Dockerized version, referred to as dshiver; SmaltAlign; viral-ngs; and V-pipe) using both simulated and real-world HIV-1 paired-end short-read datasets and default settings. All four pipelines produced consensus genome assemblies with high quality metrics (genome fraction recovery, mismatch and indel rates, variant calling F1 scores) when the reference sequence used for assembly had high similarity to the analyzed sample.
View Article and Find Full Text PDFBioinform Biol Insights
December 2024
Instituto de Agrobiotecnología y Biología Molecular (IABIMO), CICVyA, Instituto Nacional de Tecnología Agropecuaria (INTA), Buenos Aires, Argentina.
De novo assembly of transcriptomes from species without reference genome remains a common problem in functional genomics. While methods and algorithms for transcriptome assembly are continually being developed and published, the quality of de novo assemblies using short reads depends on the complexity of the transcriptome and is limited by several types of errors. One problem to overcome is the research gap regarding the best method to use in each study to obtain high-quality de novo assembly.
View Article and Find Full Text PDFGigaByte
November 2024
Institute for Ecology, Evolution and Diversity, Goethe University, Max-von-Laue-Strasse. 9, Frankfurt am Main, 60438, Germany.
Brief Bioinform
September 2024
Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Zaloška cesta 4, 1000 Ljubljana, Slovenia.
Over the past decade, there have been many improvements in the field of metagenomics, including sequencing technologies, advances in bioinformatics and the development of reference databases, but a one-size-fits-all sequencing and bioinformatics pipeline does not yet seem achievable. In this study, we address the bioinformatics part of the analysis by combining three methods into a three-step workflow that increases the sensitivity and specificity of clinical metagenomics and improves pathogen detection. The individual tools are combined into a user-friendly workflow suitable for analysing short paired-end (PE) and long reads from metagenomics datasets-MetaAll.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!