Publications by authors named "Victoria Popic"

Article Synopsis
  • Finding relatives in genomic studies is tough when data is spread across multiple organizations with sharing restrictions.
  • SF-Relate is a new federated algorithm that uses a unique hashing approach to efficiently and securely identify genetic relatives by grouping individuals into buckets and only comparing those in the same group.
  • It ensures privacy through multiparty homomorphic encryption, allowing secure computation of relatedness without any private data being shared, successfully identifying 97% of close relatives in large datasets like the UK Biobank.
View Article and Find Full Text PDF
Article Synopsis
  • Finding relatives is crucial for genomic studies, but data-sharing restrictions complicate this across different entities.
  • SF-Relate is a federated algorithm that uses locality-sensitive hashing to efficiently identify genetic relatives while preserving privacy.
  • By using multiparty homomorphic encryption, SF-Relate allows data holders to compute relatedness without sharing sensitive information, achieving high detection rates in large datasets like the UK Biobank.
View Article and Find Full Text PDF

Gene fusions are found as cancer drivers in diverse adult and pediatric cancers. Accurate detection of fusion transcripts is essential in cancer clinical diagnostics, prognostics, and for guiding therapeutic development. Most currently available methods for fusion transcript detection are compatible with Illumina RNA-seq involving highly accurate short read sequences.

View Article and Find Full Text PDF
Article Synopsis
  • Long-read RNA-sequencing methods can capture full transcript isoforms but traditionally have low throughput*. -
  • The new technique, multiplexed arrays isoform sequencing (MAS-ISO-seq), enhances this by combining cDNAs for more efficient long-read sequencing, boosting throughput by over 15 times*. -
  • In experiments with tumor-infiltrating T cells, MAS-ISO-seq led to a significant increase (12- to 32-fold) in the identification of differentially spliced genes*.
View Article and Find Full Text PDF
Article Synopsis
  • Structural variants (SVs) significantly contribute to genetic diversity and disease, highlighting the need for better detection methods in precision medicine.
  • Existing methods for detecting SVs are limited because they rely on manual features and rules, which don't scale well to the wide diversity of SVs in genomic data.
  • The Cue framework uses deep learning to analyze sequencing data by converting alignments into images and employing a convolutional neural network to accurately predict SV types, achieving superior performance compared to current methods.
View Article and Find Full Text PDF

Motivation: We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree.

View Article and Find Full Text PDF

The reconstruction of cancer phylogeny trees and quantifying the evolution of the disease is a challenging task. LICHeE and BAMSE are two computational tools designed and implemented recently for this purpose. They both utilize estimated variant allele fraction of somatic mutations across multiple samples to infer the most likely cancer phylogenies.

View Article and Find Full Text PDF

We introduce GATTACA, a framework for fast unsupervised binning of metagenomic contigs. Similar to recent approaches, GATTACA clusters contigs based on their coverage profiles across a large cohort of metagenomic samples; however, unlike previous methods that rely on read mapping, GATTACA quickly estimates these profiles from kmer counts stored in a compact index. This approach can result in over an order of magnitude speedup, while matching the accuracy of earlier methods on synthetic and real data benchmarks.

View Article and Find Full Text PDF

Low-cost clouds can alleviate the compute and storage burden of the genome sequencing data explosion. However, moving personal genome data analysis to the cloud can raise serious privacy concerns. Here, we devise a method named Balaur, a privacy preserving read mapper for hybrid clouds based on locality sensitive hashing and kmer voting.

View Article and Find Full Text PDF

Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples.

View Article and Find Full Text PDF

Summary: The increasing availability of high-throughput sequencing technologies has led to thousands of human genomes having been sequenced in the past years. Efforts such as the 1000 Genomes Project further add to the availability of human genome variation data. However, to date, there is no method that can map reads of a newly sequenced human genome to a large collection of genomes.

View Article and Find Full Text PDF