Publications by Volodymyr Kuleshov

Publications by authors named "Volodymyr Kuleshov"

Page 1 of 1

QuIP: 2-Bit Quantization of Large Language Models With Guarantees.

Jerry Chee Yaohui Cai Volodymyr Kuleshov Christopher De Sa

Adv Neural Inf Process Syst

December 2023

This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from weight and Hessian matrices, i.e.

View Article and Find Full Text PDF

Online Calibrated and Conformal Prediction Improves Bayesian Optimization.

Shachi Deshpande Charles Marx Volodymyr Kuleshov

Proc Mach Learn Res

May 2024

Accurate uncertainty estimates are important in sequential model-based decision-making tasks such as Bayesian optimization. However, these estimates can be imperfect if the data violates assumptions made by the model (e.g.

View Article and Find Full Text PDF

Cross-species modeling of plant genomes at single nucleotide resolution using a pre-trained DNA language model.

Jingjing Zhai Aaron Gokaslan Yair Schiff Ana Berthel Zong-Yan Liu Volodymyr Kuleshov

bioRxiv

August 2024

Interpreting function and fitness effects in diverse plant genomes requires transferable models. Language models (LMs) pre-trained on large-scale biological sequences can learn evolutionary conservation and offer cross-species prediction better than supervised models through fine-tuning limited labeled data. We introduce PlantCaduceus, a plant DNA LM based on the Caduceus and Mamba architectures, pre-trained on a curated dataset of 16 Angiosperm genomes.

View Article and Find Full Text PDF

A machine-compiled database of genome-wide association studies.

Volodymyr Kuleshov Jialin Ding Christopher Vo Braden Hancock Alexander Ratner

Nat Commun

July 2019

Tens of thousands of genotype-phenotype associations have been discovered to date, yet not all of them are easily accessible to scientists. Here, we describe GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms. Our information extraction system helps curators by automatically collecting over 6,000 associations from open-access publications with an estimated recall of 60-80% and with an estimated precision of 78-94% (measured relative to existing manually curated knowledge bases).

View Article and Find Full Text PDF

A guide to deep learning in healthcare.

Andre Esteva Alexandre Robicquet Bharath Ramsundar Volodymyr Kuleshov Mark DePristo

Nat Med

January 2019

Here we present deep-learning techniques for healthcare, centering our discussion on deep learning in computer vision, natural language processing, reinforcement learning, and generalized methods. We describe how these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging, and we describe the application of natural language processing to domains such as electronic health record data.

View Article and Find Full Text PDF

Fast Metagenomic Binning via Hashing and Bayesian Clustering.

Victoria Popic Volodymyr Kuleshov Michael Snyder Serafim Batzoglou

J Comput Biol

July 2018

We introduce GATTACA, a framework for fast unsupervised binning of metagenomic contigs. Similar to recent approaches, GATTACA clusters contigs based on their coverage profiles across a large cohort of metagenomic samples; however, unlike previous methods that rely on read mapping, GATTACA quickly estimates these profiles from kmer counts stored in a compact index. This approach can result in over an order of magnitude speedup, while matching the accuracy of earlier methods on synthetic and real data benchmarks.

View Article and Find Full Text PDF

Genome assembly from synthetic long read clouds.

Volodymyr Kuleshov Michael P Snyder Serafim Batzoglou

Bioinformatics

June 2016

Motivation: Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads.

Results: Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies.

View Article and Find Full Text PDF

Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome.

Volodymyr Kuleshov Chao Jiang Wenyu Zhou Fereshteh Jahanbani Serafim Batzoglou

Nat Biotechnol

January 2016

Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone.

View Article and Find Full Text PDF

Probabilistic single-individual haplotyping.

Volodymyr Kuleshov

Bioinformatics

September 2014

Motivation: Accurate haplotyping-determining from which parent particular portions of the genome are inherited-is still mostly an unresolved problem in genomics. This problem has only recently started to become tractable, thanks to the development of new long read sequencing technologies. Here, we introduce ProbHap, a haplotyping algorithm targeted at such technologies.

View Article and Find Full Text PDF

Whole-genome haplotyping using long reads and statistical methods.

Volodymyr Kuleshov Dan Xie Rui Chen Dmitry Pushkarev Zhihai Ma

Nat Biotechnol

March 2014

The rapid growth of sequencing technologies has greatly contributed to our understanding of human genetics. Yet, despite this growth, mainstream technologies have not been fully able to resolve the diploid nature of the human genome. Here we describe statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing.

View Article and Find Full Text PDF

Publications by authors named "Volodymyr Kuleshov"

QuIP: 2-Bit Quantization of Large Language Models With Guarantees.

Online Calibrated and Conformal Prediction Improves Bayesian Optimization.

Cross-species modeling of plant genomes at single nucleotide resolution using a pre-trained DNA language model.

A machine-compiled database of genome-wide association studies.

A guide to deep learning in healthcare.

Fast Metagenomic Binning via Hashing and Bayesian Clustering.

Genome assembly from synthetic long read clouds.

Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome.

Probabilistic single-individual haplotyping.

Whole-genome haplotyping using long reads and statistical methods.

A PHP Error was encountered

A PHP Error was encountered