Publications by Michael F Lin | LitMetric

Publications by authors named "Michael F Lin"

Page 1 of 2

GA4GH: International policies and standards for data sharing across genomic research and healthcare.

Heidi L Rehm Angela J H Page Lindsay Smith Jeremy B Adams Gil Alterovitz Melissa Cline Michael F Lin Mikael Linden Xianglin Liu Lincoln D Stein

Cell Genom

November 2021

The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution.

View Article and Find Full Text PDF

Accurate, scalable cohort variant calls using DeepVariant and GLnexus.

Taedong Yun Helen Li Pi-Chuan Chang Michael F Lin Andrew Carroll

Bioinformatics

April 2021

Motivation: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging.

Results: We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths.

View Article and Find Full Text PDF

Sparse Project VCF: efficient encoding of population genotype matrices.

Michael F Lin Xiaodong Bai William J Salerno Jeffrey G Reid

Bioinformatics

April 2021

Article Synopsis

Variant Call Format (VCF) is commonly used for representing genetic information but becomes very large with extensive data from population studies.
Sparse Project VCF (spVCF) is introduced as a more efficient version of VCF, reducing file sizes by over 10 times while retaining essential information.
The spVCF format is compatible with existing VCF systems and has been validated using data from large whole-exome sequencing projects, like DiscovEHR and UK Biobank.

View Article and Find Full Text PDF

IDseq-An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring.

Katrina L Kalantar Tiago Carvalho Charles F A de Bourcy Boris Dimitrov Greg Dingle Michael F Lin

Gigascience

October 2020

Background: Metagenomic next-generation sequencing (mNGS) has enabled the rapid, unbiased detection and identification of microbes without pathogen-specific reagents, culturing, or a priori knowledge of the microbial landscape. mNGS data analysis requires a series of computationally intensive processing steps to accurately determine the microbial composition of a sample. Existing mNGS data analysis tools typically require bioinformatics expertise and access to local server-class hardware resources.

View Article and Find Full Text PDF

A Color Flow Tract in Ultrasound-Guided Random Renal Core Biopsy Predicts Complications.

Marie-Helene Gagnon Michael F Lin Samantha Lancia Amber Salter Motoyo Yano

J Ultrasound Med

July 2020

Objectives: To determine patient and procedural risk factors for major complications in ultrasound (US)-guided random renal core biopsy.

Methods: Random renal biopsies performed by radiologists in the US department at a single institution between 2014 and 2018 were retrospectively reviewed. The patient's age, sex, race, and estimated glomerular filtration rate (eGFR) were recorded.

View Article and Find Full Text PDF

A strategy for building and using a human reference pangenome.

Bastien Llamas Giuseppe Narzisi Valerie Schneider Peter A Audano Evan Biederstedt Michael F Lin

F1000Res

October 2019

In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes.

View Article and Find Full Text PDF

Variation graph toolkit improves read mapping by representing genetic variation in the reference.

Erik Garrison Jouni Sirén Adam M Novak Glenn Hickey Jordan M Eizenga Michael F Lin

Nat Biotechnol

October 2018

Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias.

View Article and Find Full Text PDF

Formulating a Treatment Plan in Suspected Lymphoma: Ultrasound-Guided Core Needle Biopsy Versus Core Needle Biopsy and Fine-Needle Aspiration of Peripheral Lymph Nodes.

Monica R Drylewicz Marcus P Watkins Anup S Shetty Michael F Lin Amber Salter

J Ultrasound Med

March 2019

Objectives: Image-guided tissue sampling in the workup of suspected lymphoma can be performed by core needle biopsy (CNB) or CNB with fine-needle aspiration (FNA). We compared the yield of clinically actionable diagnoses between these methods of tissue sampling.

Methods: All ultrasound-guided percutaneous peripheral lymph node biopsies from 2010 to 2017 at a single institution were retrospectively reviewed for biopsy type (CNB versus CNB + FNA), prior diagnosis of lymphoma, size of the target lymph node, number of cores, length of core specimens, and pathologic diagnosis.

View Article and Find Full Text PDF

Evolutionary Dynamics of Abundant Stop Codon Readthrough.

Irwin Jungreis Clara S Chan Robert M Waterhouse Gabriel Fields Michael F Lin

Mol Biol Evol

December 2016

Translational stop codon readthrough emerged as a major regulatory mechanism affecting hundreds of genes in animal genomes, based on recent comparative genomics and ribosomal profiling evidence, but its evolutionary properties remain unknown. Here, we leverage comparative genomic evidence across 21 Anopheles mosquitoes to systematically annotate readthrough genes in the malaria vector Anopheles gambiae, and to provide the first study of abundant readthrough evolution, by comparison with 20 Drosophila species. Using improved comparative genomics methods for detecting readthrough, we identify evolutionary signatures of conserved, functional readthrough of 353 stop codons in the malaria vector, Anopheles gambiae, and of 51 additional Drosophila melanogaster stop codons, including several cases of double and triple readthrough and of readthrough of two adjacent stop codons.

View Article and Find Full Text PDF

Ebola Virus Epidemiology, Transmission, and Evolution during Seven Months in Sierra Leone.

Daniel J Park Gytis Dudas Shirlee Wohl Augustine Goba Shannon L M Whitmer Lina M Moses Aaron E Lin Mike Flint Michael F Lin John S Schieffelin

Cell

June 2015

The 2013-2015 Ebola virus disease (EVD) epidemic is caused by the Makona variant of Ebola virus (EBOV). Early in the epidemic, genome sequencing provided insights into virus evolution and transmission and offered important information for outbreak response. Here, we analyze sequences from 232 patients sampled over 7 months in Sierra Leone, along with 86 previously released genomes from earlier in the epidemic.

View Article and Find Full Text PDF

FRESCo: finding regions of excess synonymous constraint in diverse viruses.

Rachel S Sealfon Michael F Lin Irwin Jungreis Maxim Y Wolf Manolis Kellis

Genome Biol

February 2015

Background: The increasing availability of sequence data for many viruses provides power to detect regions under unusual evolutionary constraint at a high resolution. One approach leverages the synonymous substitution rate as a signature to pinpoint genic regions encoding overlapping or embedded functional elements. Protein-coding regions in viral genomes often contain overlapping RNA structural elements, reading frames, regulatory elements, microRNAs, and packaging signals.

View Article and Find Full Text PDF

Histogram analysis for characterization of indeterminate adrenal nodules on noncontrast CT.

Michael F Lin Lauren Q Chang-Sen Jay P Heiken Thomas K Pilgram Kyongtae T Bae

Abdom Imaging

August 2015

Objective: To determine the effectiveness of the CT histogram method to characterize indeterminate adrenal nodules above 10 Hounsfield units (HU) on noncontrast CT.

Materials And Methods: Retrospective review of clinical CT data from January 2005 through 2008 identified 194 indeterminate adrenal nodules (>10 HU on noncontrast CT) in 175 patients. 20 nodules in 18 patients were excluded due to large standard deviation (SD > 30) of HU values.

View Article and Find Full Text PDF

The effect of donor kidney volume on recipient outcomes: "dose" matters.

Anitha Vijayan Motoyo Yano Vamsi R Narra Kelsey Hoffman Thomas K Pilgram Michael F Lin

Transplantation

April 2013

View Article and Find Full Text PDF

Renal measurements on CT angiograms: correlation with graft function at living donor renal transplantation.

Motoyo Yano Michael F Lin Kelsey A Hoffman Anitha Vijayan Thomas K Pilgram

Radiology

October 2012

Purpose: To determine which measurement of donor renal size on computed tomographic (CT) angiograms has the greatest correlation with renal function preoperatively in the donor and postoperatively in the transplant recipient.

Materials And Methods: Informed consent was waived for this retrospective HIPAA-compliant study approved by the institutional review board. Renal length, total volume, and cortical volume were measured on renal donor CT angiograms in 111 patients.

View Article and Find Full Text PDF

Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis.

Andrea Pauli Eivind Valen Michael F Lin Manuel Garber Nadine L Vastenhouw Lin Fan Albin Sandelin

Genome Res

March 2012

Long noncoding RNAs (lncRNAs) comprise a diverse class of transcripts that structurally resemble mRNAs but do not encode proteins. Recent genome-wide studies in humans and the mouse have annotated lncRNAs expressed in cell lines and adult tissues, but a systematic analysis of lncRNAs expressed during vertebrate embryogenesis has been elusive. To identify lncRNAs with potential functions in vertebrate embryogenesis, we performed a time-series of RNA-seq experiments at eight stages during early zebrafish development.

View Article and Find Full Text PDF

Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes.

Michael F Lin Pouya Kheradpour Stefan Washietl Brian J Parker Jakob S Pedersen

Genome Res

November 2011

The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes--especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species.

View Article and Find Full Text PDF

Evidence of abundant stop codon readthrough in Drosophila and other metazoa.

Irwin Jungreis Michael F Lin Rebecca Spokony Clara S Chan Nicolas Negre

Genome Res

December 2011

While translational stop codon readthrough is often used by viral genomes, it has been observed for only a handful of eukaryotic genes. We previously used comparative genomics evidence to recognize protein-coding regions in 12 species of Drosophila and showed that for 149 genes, the open reading frame following the stop codon has a protein-coding conservation signature, hinting that stop codon readthrough might be common in Drosophila. We return to this observation armed with deep RNA sequence data from the modENCODE project, an improved higher-resolution comparative genomics metric for detecting protein-coding regions, comparative sequence information from additional species, and directed experimental evidence.

View Article and Find Full Text PDF

A high-resolution map of human evolutionary constraint using 29 mammals.

Kerstin Lindblad-Toh Manuel Garber Or Zuk Michael F Lin Brian J Parker David Dooling

Nature

October 2011

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.

View Article and Find Full Text PDF

PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions.

Michael F Lin Irwin Jungreis Manolis Kellis

Bioinformatics

July 2011

Motivation: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models.

Results: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study.

View Article and Find Full Text PDF

Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters.

Tiffany Hung Yulei Wang Michael F Lin Ashley K Koegel Yojiro Kotake Hugo M Horlings

Nat Genet

June 2011

Transcription of long noncoding RNAs (lncRNAs) within gene regulatory elements can modulate gene activity in response to external stimuli, but the scope and functions of such activity are not known. Here we use an ultrahigh-density array that tiles the promoters of 56 cell-cycle genes to interrogate 108 samples representing diverse perturbations. We identify 216 transcribed regions that encode putative lncRNAs, many with RT-PCR-validated periodic expression during the cell cycle, show altered expression in human cancers and are regulated in expression by specific oncogenic stimuli, stem cell differentiation or DNA damage.

View Article and Find Full Text PDF

Comparative functional genomics of the fission yeasts.

Nicholas Rhind Zehua Chen Moran Yassour Dawn A Thompson Brian J Haas Michael F Lin Aaron M Berlin Lin Fan Carolin A Müller

Science

May 2011

The fission yeast clade--comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus, and S.

View Article and Find Full Text PDF

Error and error mitigation in low-coverage genome assemblies.

Melissa J Hubisz Michael F Lin Manolis Kellis Adam Siepel

PLoS One

February 2011

The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ∼2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1-4 errors per kilobase.

View Article and Find Full Text PDF

Identification of functional elements and regulatory circuits by Drosophila modENCODE.

Science

December 2010

To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction.

View Article and Find Full Text PDF

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

Kim D Pruitt Jennifer Harrow Rachel A Harte Craig Wallin Mark Diekhans Sarah Ayling Michael F Lin

Genome Res

July 2009

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers.

View Article and Find Full Text PDF

Evolution of pathogenicity and sexual reproduction in eight Candida genomes.

Geraldine Butler Matthew D Rasmussen Michael F Lin Manuel A S Santos Sharadha Sakthikumar

Nature

June 2009

Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence.

View Article and Find Full Text PDF