Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants.

Viachaslau Tsyvina David S Campo Seth Sims Alex Zelikovsky Yury Khudyakov Pavel Skums

BMC Bioinformatics

Computer Science Department, Georgia State University, 25 Park Place NE, Atlanta, 30303, GA, USA.

Published: October 2018

Background: Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naïeve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets.

Results: In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. It shows better filtering quality and time performance when comparing to other tools. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj CONCLUSION: The proposed tool allows for efficient detection of genetic relatedness between genomic samples produced by deep sequencing of heterogeneous populations. It should be especially useful for analysis of relatedness of genomes of viruses with unevenly distributed variable genomic regions, such as HIV and HCV. For the future we envision, that besides applications in molecular epidemiology the tool can also be adapted to immunosequencing and metagenomics data.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6196405	PMC
http://dx.doi.org/10.1186/s12859-018-2333-9	DOI Listing

Publication Analysis

Top Keywords

heterogeneous populations

genetic relatedness

genetically close

produced deep

deep sequencing

sequencing heterogeneous

fast estimation

estimation genetic

relatedness members

members heterogeneous

Similar Publications

Historical redlining and clustering of present-day breast cancer factors.

Cancer Causes Control

January 2025

Department of Epidemiology and Environmental Health, School of Public Health and Health Professions, State University of New York at Buffalo, 265 Farber Hall, Buffalo, NY, 14214, USA.

Sarah M Lima Tia M Palermo Jared Aldstadt Lili Tian Helen C S Meier

Purpose: Historical redlining, a 1930s-era form of residential segregation and proxy of structural racism, has been associated with breast cancer risk, stage, and survival, but research is lacking on how known present-day breast cancer risk factors are related to historical redlining. We aimed to describe the clustering of present-day neighborhood-level breast cancer risk factors with historical redlining and evaluate geographic patterning across the US.

Methods: This ecologic study included US neighborhoods (census tracts) with Home Owners' Loan Corporation (HOLC) grades, defined as having a score in the Historic Redlining Score dataset; 2019 Population Level Analysis and Community EStimates (PLACES) data; and 2014-2016 Environmental Justice Index (EJI) data.

View Article and Find Full Text PDF

Similar Publications

Segmentation aware probabilistic phenotyping of single-cell spatial protein expression data.

Nat Commun

January 2025

Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.

Yuju Lee Edward L Y Chen Darren C H Chan Anuroopa Dinesh Somaieh Afiuni-Zadeh

Spatial protein expression technologies can map cellular content and organization by simultaneously quantifying the expression of >40 proteins at subcellular resolution within intact tissue sections and cell lines. However, necessary image segmentation to single cells is challenging and error prone, easily confounding the interpretation of cellular phenotypes and cell clusters. To address these limitations, we present STARLING, a probabilistic machine learning model designed to quantify cell populations from spatial protein expression data while accounting for segmentation errors.

View Article and Find Full Text PDF

Similar Publications

Prevalence of gallstone disease in Africa: a systematic review and meta-analysis.

BMJ Open Gastroenterol

January 2025

Biomedical Sciences, Wollo University, Dessie, Ethiopia.

Seid Mohammed Abdu Ebrahim Msaye Assefa

Objective: Gallstone disease is a prevalent global health issue, but its impact in Africa remains unclear. This study aims to summarise and synthesise available data on the prevalence of gallstone disease across populations in Africa.

Design: Systematic review and meta-analysis, reported in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines.

View Article and Find Full Text PDF

Similar Publications

Genetic Insights Into Coronary Microvascular Disease.

Microcirculation

January 2025

Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.

Nicole Wayne Venkata S Singamneni Rasika Venkatesh Tess Cherlin Shefali S Verma

Coronary microvascular disease (CMVD) affects the coronary pre-arterioles, arterioles, and capillaries and can lead to blood supply-demand mismatch and cardiac ischemia. CMVD can present clinically as ischemia or myocardial infarction with no obstructive coronary arteries (INOCA or MINOCA, respectively). Currently, therapeutic options for CMVD are limited, and there are no targeted therapies.

View Article and Find Full Text PDF

Similar Publications

A pipeline for harmonising NHS Scotland laboratory data to enable national-level analyses.

J Biomed Inform

January 2025

Health Informatics Centre University of Dundee UK; Health Data Research UK London UK. Electronic address:

Chuang Gao Shahzad Mumtaz Sophie McCall Katherine O'Sullivan Mark McGilchrist

Objective: Medical laboratory data together with prescribing and hospitalisation records are three of the most used electronic health records (EHRs) for data-driven health research. In Scotland, hospitalisation, prescribing and the death register data are available nationally whereas laboratory data is captured, stored and reported from local health board systems with significant heterogeneity. For researchers or other users of this regionally curated data, working on laboratory datasets across regional cohorts requires effort and time.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!