Publications by authors named "Yun Yoo"

Post-Acute Sequelae of SARS-CoV-2 infection (PASC), also known as Long-COVID, encompasses a variety of complex and varied outcomes following COVID-19 infection that are still poorly understood. We clustered over 600 million condition diagnoses from 14 million patients available through the National COVID Cohort Collaborative (N3C), generating hundreds of highly detailed clinical phenotypes. Assessing patient clinical trajectories using these clusters allowed us to identify individual conditions and phenotypes strongly increased after acute infection.

View Article and Find Full Text PDF

Background: Although the COVID-19 pandemic has persisted for over 3 years, reinfections with SARS-CoV-2 are not well understood. We aim to characterize reinfection, understand development of Long COVID after reinfection, and compare severity of reinfection with initial infection.

Methods: We use an electronic health record study cohort of over 3 million patients from the National COVID Cohort Collaborative as part of the NIH Researching COVID to Enhance Recovery Initiative.

View Article and Find Full Text PDF
Article Synopsis
  • Post-Acute Sequelae of SARS-CoV-2 infection (PASC), or Long-COVID, involves a range of complex health outcomes that arise after COVID-19, which are still not fully understood.
  • Researchers analyzed over 600 million diagnoses from 14 million patients to create detailed clinical categories and examined patients' health outcomes over time.
  • The study identified numerous health conditions that were more prevalent in COVID-19 patients compared to non-infected individuals, highlighting specific patterns based on factors like sex, age, and severity, which may lead to better diagnostics and understanding of Long-COVID.
View Article and Find Full Text PDF

Estimates of post-acute sequelae of SARS-CoV-2 infection (PASC) incidence, also known as Long COVID, have varied across studies and changed over time. We estimated PASC incidence among adult and pediatric populations in three nationwide research networks of electronic health records (EHR) participating in the RECOVER Initiative using different classification algorithms (computable phenotypes). Overall, 7% of children and 8.

View Article and Find Full Text PDF

Over recent decades, machine learning, an integral subfield of artificial intelligence, has revolutionized diverse sectors, enabling data-driven decisions with minimal human intervention. In particular, the field of educational assessment emerges as a promising area for machine learning applications, where students can be classified and diagnosed using their performance data. The objectives of Diagnostic Classification Models (DCMs), which provide a suite of methods for diagnosing students' cognitive states in relation to the mastery of necessary cognitive attributes for solving problems in a test, can be effectively addressed through machine learning techniques.

View Article and Find Full Text PDF

Bulk analyses of pancreatic ductal adenocarcinoma (PDAC) samples are complicated by the tumor microenvironment (TME), i.e. signals from fibroblasts, endocrine, exocrine, and immune cells.

View Article and Find Full Text PDF

Background: AKI is associated with mortality in patients hospitalized with coronavirus disease 2019 (COVID-19); however, its incidence, geographic distribution, and temporal trends since the start of the pandemic are understudied.

Methods: Electronic health record data were obtained from 53 health systems in the United States in the National COVID Cohort Collaborative. We selected hospitalized adults diagnosed with COVID-19 between March 6, 2020, and January 6, 2022.

View Article and Find Full Text PDF

Objective: Clinical encounter data are heterogeneous and vary greatly from institution to institution. These problems of variance affect interpretability and usability of clinical encounter data for analysis. These problems are magnified when multisite electronic health record (EHR) data are networked together.

View Article and Find Full Text PDF

Although the COVID-19 pandemic has persisted for over 2 years, reinfections with SARS-CoV-2 are not well understood. We use the electronic health record (EHR)-based study cohort from the National COVID Cohort Collaborative (N3C) as part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative to characterize reinfection, understand development of Long COVID after reinfection, and compare severity of reinfection with initial infection. We validate previous findings of reinfection incidence (5.

View Article and Find Full Text PDF

Background: Acute kidney injury (AKI) is associated with mortality in patients hospitalized with COVID-19, however, its incidence, geographic distribution, and temporal trends since the start of the pandemic are understudied.

Methods: Electronic health record data were obtained from 53 health systems in the United States (US) in the National COVID Cohort Collaborative (N3C). We selected hospitalized adults diagnosed with COVID-19 between March 6th, 2020, and January 6th, 2022.

View Article and Find Full Text PDF
Article Synopsis
  • The study aimed to standardize and fill in missing units from electronic health records (EHRs) by developing a systematic method for converting and validating these measurements, focusing on COVID-19 research.
  • The researchers worked with over 3.1 billion patient records and 19,000 unique measurements, successfully harmonizing 88.1% of values and imputing units for 78.2% of records that initially lacked them.
  • This new approach enhances the ability to analyze diverse EHR data, making valuable information accessible for public health insights and research efforts.
View Article and Find Full Text PDF

Objective: The purpose of the study is to evaluate the relationship between HbA1c and severity of coronavirus disease 2019 (COVID-19) outcomes in patients with type 2 diabetes (T2D) with acute COVID-19 infection.

Research Design And Methods: We conducted a retrospective study using observational data from the National COVID Cohort Collaborative (N3C), a longitudinal, multicenter U.S.

View Article and Find Full Text PDF

Importance: Understanding of SARS-CoV-2 infection in US children has been limited by the lack of large, multicenter studies with granular data.

Objective: To examine the characteristics, changes over time, outcomes, and severity risk factors of children with SARS-CoV-2 within the National COVID Cohort Collaborative (N3C).

Design, Setting, And Participants: A prospective cohort study of encounters with end dates before September 24, 2021, was conducted at 56 N3C facilities throughout the US.

View Article and Find Full Text PDF

Importance: SARS-CoV-2.

Objective: To determine the characteristics, changes over time, outcomes, and severity risk factors of SARS-CoV-2 affected children within the National COVID Cohort Collaborative (N3C).

Design: Prospective cohort study of patient encounters with end dates before May 27th, 2021.

View Article and Find Full Text PDF
Article Synopsis
  • - The National COVID Cohort Collaborative (N3C) is a massive electronic health record database that provides valuable insights into COVID-19, supporting the development of better diagnostic tools and clinical practices.
  • - This study analyzed data from nearly 2 million adults across 34 medical centers to evaluate the severity of COVID-19 and its risk factors over time, using advanced machine learning techniques to predict severe outcomes.
  • - Among the 174,568 adults infected with SARS-CoV-2, a significant portion experienced severe illness, highlighting the need for continuous monitoring and adjustment of treatment approaches based on demographic characteristics and disease severity.
View Article and Find Full Text PDF
Article Synopsis
  • The National COVID Cohort Collaborative (N3C) is the largest U.S. COVID-19 patient database, created to provide a comprehensive analysis of clinical characteristics, disease progression, and treatment outcomes across multiple health centers, enhancing predictive and diagnostic tools for COVID-19.
  • A study involving over 1.9 million patients from 34 medical centers found significant clinical data, showing that certain factors like age, sex, and underlying conditions affect disease severity, with a notable decrease in mortality rates among hospitalized patients over time.
  • The N3C dataset was utilized in machine learning models to successfully predict severe outcomes in COVID-19 patients, achieving high accuracy rates and demonstrating the potential of using electronic health
View Article and Find Full Text PDF

Summary: For the analysis of high-throughput genomic data produced by next-generation sequencing (NGS) technologies, researchers need to identify linkage disequilibrium (LD) structure in the genome. In this work, we developed an R package gpart which provides clustering algorithms to define LD blocks or analysis units consisting of SNPs. The visualization tool in gpart can display the LD structure and gene positions for up to 20 000 SNPs in one image.

View Article and Find Full Text PDF

Pathway-based analysis in genome-wide association study (GWAS) is being widely used to uncover novel multi-genic functional associations. Many of these pathway-based methods have been used to test the enrichment of the associated genes in the pathways, but exhibited low powers and were highly affected by free parameters. We present the novel method and software GSA-SNP2 for pathway enrichment analysis of GWAS P-value data.

View Article and Find Full Text PDF

Motivation: Linkage disequilibrium (LD) block construction is required for research in population genetics and genetic epidemiology, including specification of sets of single nucleotide polymorphisms (SNPs) for analysis of multi-SNP based association and identification of haplotype blocks in high density sequencing data. Existing methods based on a narrow sense definition do not allow intermediate regions of low LD between strongly associated SNP pairs and tend to split high density SNP data into small blocks having high between-block correlation.

Results: We present Big-LD, a block partition method based on interval graph modeling of LD bins which are clusters of strong pairwise LD SNPs, not necessarily physically consecutive.

View Article and Find Full Text PDF

Many researchers have found that one of the most important characteristics of the structure of linkage disequilibrium is that the human genome can be divided into non-overlapping block partitions in which only a small number of haplotypes are observed. The location and distribution of haplotype blocks can be seen as a population property influenced by population genetic events such as selection, mutation, recombination and population structure. In this study, we investigate the effects of the density of markers relative to the full set of all polymorphisms in the region on the results of haplotype partitioning for five popular haplotype block partition methods: three methods in Haploview (confidence interval, four gamete test, and solid spine), MIG++ implemented in PLINK 1.

View Article and Find Full Text PDF

By jointly analyzing multiple variants within a gene, instead of one at a time, gene-based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive.

View Article and Find Full Text PDF

Gene-based analysis of multiple single nucleotide polymorphisms (SNPs) in a gene region is an alternative to single SNP analysis. The multi-bin linear combination test (MLC) proposed in previous studies utilizes the correlation among SNPs within a gene to construct a gene-based global test. SNPs are partitioned into clusters of highly correlated SNPs, and the MLC test statistic quadratically combines linear combination statistics constructed for each cluster.

View Article and Find Full Text PDF

Multi-marker methods for genetic association analysis can be performed for common and low frequency SNPs to improve power. Regression models are an intuitive way to formulate multi-marker tests. In previous studies we evaluated regression-based multi-marker tests for common SNPs, and through identification of bins consisting of correlated SNPs, developed a multi-bin linear combination (MLC) test that is a compromise between a 1 df linear combination test and a multi-df global test.

View Article and Find Full Text PDF

Objective: Although genome-wide association studies (GWAS) have substantially contributed to understanding the genetic architecture, unidentified variants for complex traits remain an issue. One of the efficient approaches is the improvement of the power of GWAS scan by weighting P values with prior linkage signals. Our objective was to identify the novel candidates for obesity in Asian populations by using genemapping strategies that combine linkage and association analyses.

View Article and Find Full Text PDF

The estimated glomerular filtration rate is a well-known measure of renal function and is widely used to follow the course of disease. Although there have been several investigations establishing the genetic background contributing to renal function, Asian populations have rarely been used in these genome-wide studies. Here, we aimed to find candidate genetic determinants of renal function in 1007 individuals from 73 extended families of Mongolian origin.

View Article and Find Full Text PDF

Synopsis of recent research by authors named "Yun Yoo"

  • - Yun Yoo's recent research predominantly investigates the long-term health effects of SARS-CoV-2, particularly focusing on the complexities of Long-COVID through analyses of large electronic health record datasets from national COVID initiatives like N3C and RECOVER.
  • - The findings highlight the prevalence of post-acute sequelae and cognitive impairments following COVID-19 infections, demonstrating significant increases in specific clinical phenotypes post-infection, as well as differences in reinfection rates and outcomes compared to initial infections.
  • - Yoo's work also explores the heterogeneity and interpretability issues of clinical encounter data in networked settings, emphasizing the importance of accurate data classification techniques to enhance understanding of patient trajectories and health outcomes.