Background: Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naïeve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets.
Results: In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. It shows better filtering quality and time performance when comparing to other tools. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj CONCLUSION: The proposed tool allows for efficient detection of genetic relatedness between genomic samples produced by deep sequencing of heterogeneous populations. It should be especially useful for analysis of relatedness of genomes of viruses with unevenly distributed variable genomic regions, such as HIV and HCV. For the future we envision, that besides applications in molecular epidemiology the tool can also be adapted to immunosequencing and metagenomics data.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6196405 | PMC |
http://dx.doi.org/10.1186/s12859-018-2333-9 | DOI Listing |
Cancer Causes Control
January 2025
Department of Epidemiology and Environmental Health, School of Public Health and Health Professions, State University of New York at Buffalo, 265 Farber Hall, Buffalo, NY, 14214, USA.
Purpose: Historical redlining, a 1930s-era form of residential segregation and proxy of structural racism, has been associated with breast cancer risk, stage, and survival, but research is lacking on how known present-day breast cancer risk factors are related to historical redlining. We aimed to describe the clustering of present-day neighborhood-level breast cancer risk factors with historical redlining and evaluate geographic patterning across the US.
Methods: This ecologic study included US neighborhoods (census tracts) with Home Owners' Loan Corporation (HOLC) grades, defined as having a score in the Historic Redlining Score dataset; 2019 Population Level Analysis and Community EStimates (PLACES) data; and 2014-2016 Environmental Justice Index (EJI) data.
Nat Commun
January 2025
Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
Spatial protein expression technologies can map cellular content and organization by simultaneously quantifying the expression of >40 proteins at subcellular resolution within intact tissue sections and cell lines. However, necessary image segmentation to single cells is challenging and error prone, easily confounding the interpretation of cellular phenotypes and cell clusters. To address these limitations, we present STARLING, a probabilistic machine learning model designed to quantify cell populations from spatial protein expression data while accounting for segmentation errors.
View Article and Find Full Text PDFBMJ Open Gastroenterol
January 2025
Biomedical Sciences, Wollo University, Dessie, Ethiopia.
Objective: Gallstone disease is a prevalent global health issue, but its impact in Africa remains unclear. This study aims to summarise and synthesise available data on the prevalence of gallstone disease across populations in Africa.
Design: Systematic review and meta-analysis, reported in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines.
Microcirculation
January 2025
Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.
Coronary microvascular disease (CMVD) affects the coronary pre-arterioles, arterioles, and capillaries and can lead to blood supply-demand mismatch and cardiac ischemia. CMVD can present clinically as ischemia or myocardial infarction with no obstructive coronary arteries (INOCA or MINOCA, respectively). Currently, therapeutic options for CMVD are limited, and there are no targeted therapies.
View Article and Find Full Text PDFJ Biomed Inform
January 2025
Objective: Medical laboratory data together with prescribing and hospitalisation records are three of the most used electronic health records (EHRs) for data-driven health research. In Scotland, hospitalisation, prescribing and the death register data are available nationally whereas laboratory data is captured, stored and reported from local health board systems with significant heterogeneity. For researchers or other users of this regionally curated data, working on laboratory datasets across regional cohorts requires effort and time.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!