The identification of rare haplotypes may greatly expand our knowledge in the genetic architecture of both complex and monogenic traits. To this aim, we developed PERHAPS (Paired-End short Reads-based HAPlotyping from next-generation Sequencing data), a new and simple approach to directly call haplotypes from short-read, paired-end Next Generation Sequencing (NGS) data. To benchmark this method, we considered the APOE classic polymorphism (*1/*2/*3/*4), since it represents one of the best examples of functional polymorphism arising from the haplotype combination of two Single Nucleotide Polymorphisms (SNPs). We leveraged the big Whole Exome Sequencing (WES) and SNP-array data obtained from the multi-ethnic UK BioBank (UKBB, N=48,855). By applying PERHAPS, based on piecing together the paired-end reads according to their FASTQ-labels, we extracted the haplotype data, along with their frequencies and the individual diplotype. Concordance rates between WES directly called diplotypes and the ones generated through statistical pre-phasing and imputation of SNP-array data are extremely high (>99%), either when stratifying the sample by SNP-array genotyping batch or self-reported ethnic group. Hardy-Weinberg Equilibrium tests and the comparison of obtained haplotype frequencies with the ones available from the 1000 Genome Project further supported the reliability of PERHAPS. Notably, we were able to determine the existence of the rare APOE*1 haplotype in two unrelated African subjects from UKBB, supporting its presence at appreciable frequency (approximatively 0.5%) in the African Yoruba population. Despite acknowledging some technical shortcomings, PERHAPS represents a novel and simple approach that will partly overcome the limitations in direct haplotype calling from short read-based sequencing.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbaa320DOI Listing

Publication Analysis

Top Keywords

paired-end short
8
short reads-based
8
reads-based haplotyping
8
haplotyping next-generation
8
next-generation sequencing
8
sequencing data
8
simple approach
8
snp-array data
8
data
6
sequencing
5

Similar Publications

The Circumsporozoite Protein (PfCSP) has been used in developing the RTS,S, and R21 malaria vaccines. However, genetic polymorphisms within compromise the effectiveness of the vaccine. Thus, it is essential to continuously assess the genetic diversity of , especially when deploying it across different geographical regions.

View Article and Find Full Text PDF

The increasingly widespread application of next-generation sequencing (NGS) in clinical diagnostics and epidemiological research has generated a demand for robust, fast, automated, and user-friendly bioinformatics workflows. To guide the choice of tools for the assembly of full-length viral genomes from NGS datasets, we assessed the performance and applicability of four open-source bioinformatics pipelines (shiver-for which we created a user-friendly Dockerized version, referred to as dshiver; SmaltAlign; viral-ngs; and V-pipe) using both simulated and real-world HIV-1 paired-end short-read datasets and default settings. All four pipelines produced consensus genome assemblies with high quality metrics (genome fraction recovery, mismatch and indel rates, variant calling F1 scores) when the reference sequence used for assembly had high similarity to the analyzed sample.

View Article and Find Full Text PDF

Comprehensive Analysis of the Influence of Technical and Biological Variations on De Novo Assembly of RNA-Seq Datasets.

Bioinform Biol Insights

December 2024

Instituto de Agrobiotecnología y Biología Molecular (IABIMO), CICVyA, Instituto Nacional de Tecnología Agropecuaria (INTA), Buenos Aires, Argentina.

De novo assembly of transcriptomes from species without reference genome remains a common problem in functional genomics. While methods and algorithms for transcriptome assembly are continually being developed and published, the quality of de novo assemblies using short reads depends on the complexity of the transcriptome and is limited by several types of errors. One problem to overcome is the research gap regarding the best method to use in each study to obtain high-quality de novo assembly.

View Article and Find Full Text PDF
Article Synopsis
  • The Baikal seal, a freshwater seal unique to Lake Baikal, has a long history of being landlocked and is classified as a species of least concern due to its stable population despite its limited habitat.
  • Recent research has expanded on genetic studies by sequencing the genomes of six Baikal seals alongside other seal species, enhancing our understanding of their evolutionary relationships.
  • Findings indicate that the genetic diversity of the Baikal seal is comparable to that of other seals, prompting calls for further research on genomic diversity across its range.
View Article and Find Full Text PDF

MetaAll: integrative bioinformatics workflow for analysing clinical metagenomic data.

Brief Bioinform

September 2024

Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Zaloška cesta 4, 1000 Ljubljana, Slovenia.

Over the past decade, there have been many improvements in the field of metagenomics, including sequencing technologies, advances in bioinformatics and the development of reference databases, but a one-size-fits-all sequencing and bioinformatics pipeline does not yet seem achievable. In this study, we address the bioinformatics part of the analysis by combining three methods into a three-step workflow that increases the sensitivity and specificity of clinical metagenomics and improves pathogen detection. The individual tools are combined into a user-friendly workflow suitable for analysing short paired-end (PE) and long reads from metagenomics datasets-MetaAll.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!