Publications by authors named "Michael K Yu"

Plasmids alter microbial evolution and lifestyles by mobilizing genes that often confer fitness in changing environments across clades. Yet our ecological and evolutionary understanding of naturally occurring plasmids is far from complete. Here we developed a machine-learning model, PlasX, which identified 68,350 non-redundant plasmids across human gut metagenomes and organized them into 1,169 evolutionarily cohesive 'plasmid systems' using our sequence containment-aware network-partitioning algorithm, MobMess.

View Article and Find Full Text PDF

Plasmids are extrachromosomal genetic elements that often encode fitness-enhancing features. However, many bacteria carry "cryptic" plasmids that do not confer clear beneficial functions. We identified one such cryptic plasmid, pBI143, which is ubiquitous across industrialized gut microbiomes and is 14 times as numerous as crAssphage, currently established as the most abundant extrachromosomal genetic element in the human gut.

View Article and Find Full Text PDF

A wide variety of human diseases are associated with loss of microbial diversity in the human gut, inspiring a great interest in the diagnostic or therapeutic potential of the microbiota. However, the ecological forces that drive diversity reduction in disease states remain unclear, rendering it difficult to ascertain the role of the microbiota in disease emergence or severity. One hypothesis to explain this phenomenon is that microbial diversity is diminished as disease states select for microbial populations that are more fit to survive environmental stress caused by inflammation or other host factors.

View Article and Find Full Text PDF

Background: Changes in microbial community composition as a function of human health and disease states have sparked remarkable interest in the human gut microbiome. However, establishing reproducible insights into the determinants of microbial succession in disease has been a formidable challenge.

Results: Here we use fecal microbiota transplantation (FMT) as an in natura experimental model to investigate the association between metabolic independence and resilience in stressed gut environments.

View Article and Find Full Text PDF

Plasmids are extrachromosomal genetic elements that often encode fitness enhancing features. However, many bacteria carry 'cryptic' plasmids that do not confer clear beneficial functions. We identified one such cryptic plasmid, pBI143, which is ubiquitous across industrialized gut microbiomes, and is 14 times as numerous as crAssphage, currently established as the most abundant genetic element in the human gut.

View Article and Find Full Text PDF

A major goal of cancer research is to understand how mutations distributed across diverse genes affect common cellular systems, including multiprotein complexes and assemblies. Two challenges—how to comprehensively map such systems and how to identify which are under mutational selection—have hindered this understanding. Accordingly, we created a comprehensive map of cancer protein systems integrating both new and published multi-omic interaction data at multiple scales of analysis.

View Article and Find Full Text PDF

Recent studies of the tumor genome seek to identify cancer pathways as groups of genes in which mutations are epistatic with one another or, specifically, "mutually exclusive." Here, we show that most mutations are mutually exclusive not due to pathway structure but to interactions with disease subtype and tumor mutation load. In particular, many cancer driver genes are mutated preferentially in tumors with few mutations overall, causing mutations in these cancer genes to appear mutually exclusive with numerous others.

View Article and Find Full Text PDF

Systems biology requires not only genome-scale data but also methods to integrate these data into interpretable models. Previously, we developed approaches that organize omics data into a structured hierarchy of cellular components and pathways, called a "data-driven ontology." Such hierarchies recapitulate known cellular subsystems and discover new ones.

View Article and Find Full Text PDF

A major ambition of artificial intelligence lies in translating patient data to successful therapies. Machine learning models face particular challenges in biomedicine, however, including handling of extreme data heterogeneity and lack of mechanistic insight into predictions. Here, we argue for "visible" approaches that guide model structure with experimental biology.

View Article and Find Full Text PDF

Although cancer genomes are replete with noncoding mutations, the effects of these mutations remain poorly characterized. Here we perform an integrative analysis of 930 tumor whole genomes and matched transcriptomes, identifying a network of 193 noncoding loci in which mutations disrupt target gene expression. These 'somatic eQTLs' (expression quantitative trait loci) are frequently mutated in specific cancer tissues, and the majority can be validated in an independent cohort of 3,382 tumors.

View Article and Find Full Text PDF
Article Synopsis
  • Gene networks are increasing in size and number, prompting an evaluation of which networks best identify disease gene sets from various research methods.
  • Out of 21 human genome-wide interaction networks assessed, STRING, ConsensusPathDB, and GIANT showed the highest effectiveness at recovering disease-related genes.
  • The study highlights that network performance generally improves with size, but the DIP network stands out for its efficiency, leading to the creation of a composite network to enhance disease research.
View Article and Find Full Text PDF

Although artificial neural networks are powerful classifiers, their internal structures are hard to interpret. In the life sciences, extensive knowledge of cell biology provides an opportunity to design visible neural networks (VNNs) that couple the model's inner workings to those of real systems. Here we develop DCell, a VNN embedded in the hierarchical structure of 2,526 subsystems comprising a eukaryotic cell (http://d-cell.

View Article and Find Full Text PDF

Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network.

View Article and Find Full Text PDF

Background: Approximately 12% of all ureteral stents placed are retained or "forgotten." Forgotten stents are associated with significant safety concerns as well as increased costs and legal issues. Retained ureteral stents (RUS) often occur due to lack of clinical follow-up, communication or language barriers, and economic concerns.

View Article and Find Full Text PDF

Background: Global but predictable changes impact the DNA methylome as we age, acting as a type of molecular clock. This clock can be hastened by conditions that decrease lifespan, raising the question of whether it can also be slowed, for example, by conditions that increase lifespan. Mice are particularly appealing organisms for studies of mammalian aging; however, epigenetic clocks have thus far been formulated only in humans.

View Article and Find Full Text PDF
Article Synopsis
  • Researchers developed a method called Active Interaction Mapping to organize biological functions, focusing on autophagy, a crucial recycling process linked to many diseases.
  • Using this approach, they created an initial model based on gene networks from yeast (Saccharomyces), capturing key elements of autophagy and their relationships to processes like vesicle transport and stress response.
  • By analyzing over 156,000 synthetic-lethal interactions, they significantly enhanced the model, identifying 220 functions related to autophagy, including previously unknown roles for specific proteins involved in the process.
View Article and Find Full Text PDF

Accurately translating genotype to phenotype requires accounting for the functional impact of genetic variation at many biological scales. Here we present a strategy for genotype-phenotype reasoning based on existing knowledge of cellular subsystems. These subsystems and their hierarchical organization are defined by the Gene Ontology or a complementary ontology inferred directly from previously published datasets.

View Article and Find Full Text PDF