Post-Acute Sequelae of SARS-CoV-2 infection (PASC), also known as Long-COVID, encompasses a variety of complex and varied outcomes following COVID-19 infection that are still poorly understood. We clustered over 600 million condition diagnoses from 14 million patients available through the National COVID Cohort Collaborative (N3C), generating hundreds of highly detailed clinical phenotypes. Assessing patient clinical trajectories using these clusters allowed us to identify individual conditions and phenotypes strongly increased after acute infection.
View Article and Find Full Text PDFBackground: Although the COVID-19 pandemic has persisted for over 3 years, reinfections with SARS-CoV-2 are not well understood. We aim to characterize reinfection, understand development of Long COVID after reinfection, and compare severity of reinfection with initial infection.
Methods: We use an electronic health record study cohort of over 3 million patients from the National COVID Cohort Collaborative as part of the NIH Researching COVID to Enhance Recovery Initiative.
Estimates of post-acute sequelae of SARS-CoV-2 infection (PASC) incidence, also known as Long COVID, have varied across studies and changed over time. We estimated PASC incidence among adult and pediatric populations in three nationwide research networks of electronic health records (EHR) participating in the RECOVER Initiative using different classification algorithms (computable phenotypes). Overall, 7% of children and 8.
View Article and Find Full Text PDFOver recent decades, machine learning, an integral subfield of artificial intelligence, has revolutionized diverse sectors, enabling data-driven decisions with minimal human intervention. In particular, the field of educational assessment emerges as a promising area for machine learning applications, where students can be classified and diagnosed using their performance data. The objectives of Diagnostic Classification Models (DCMs), which provide a suite of methods for diagnosing students' cognitive states in relation to the mastery of necessary cognitive attributes for solving problems in a test, can be effectively addressed through machine learning techniques.
View Article and Find Full Text PDFBulk analyses of pancreatic ductal adenocarcinoma (PDAC) samples are complicated by the tumor microenvironment (TME), i.e. signals from fibroblasts, endocrine, exocrine, and immune cells.
View Article and Find Full Text PDFBackground: AKI is associated with mortality in patients hospitalized with coronavirus disease 2019 (COVID-19); however, its incidence, geographic distribution, and temporal trends since the start of the pandemic are understudied.
Methods: Electronic health record data were obtained from 53 health systems in the United States in the National COVID Cohort Collaborative. We selected hospitalized adults diagnosed with COVID-19 between March 6, 2020, and January 6, 2022.
Objective: Clinical encounter data are heterogeneous and vary greatly from institution to institution. These problems of variance affect interpretability and usability of clinical encounter data for analysis. These problems are magnified when multisite electronic health record (EHR) data are networked together.
View Article and Find Full Text PDFAlthough the COVID-19 pandemic has persisted for over 2 years, reinfections with SARS-CoV-2 are not well understood. We use the electronic health record (EHR)-based study cohort from the National COVID Cohort Collaborative (N3C) as part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative to characterize reinfection, understand development of Long COVID after reinfection, and compare severity of reinfection with initial infection. We validate previous findings of reinfection incidence (5.
View Article and Find Full Text PDFBackground: Acute kidney injury (AKI) is associated with mortality in patients hospitalized with COVID-19, however, its incidence, geographic distribution, and temporal trends since the start of the pandemic are understudied.
Methods: Electronic health record data were obtained from 53 health systems in the United States (US) in the National COVID Cohort Collaborative (N3C). We selected hospitalized adults diagnosed with COVID-19 between March 6th, 2020, and January 6th, 2022.
Objective: The purpose of the study is to evaluate the relationship between HbA1c and severity of coronavirus disease 2019 (COVID-19) outcomes in patients with type 2 diabetes (T2D) with acute COVID-19 infection.
Research Design And Methods: We conducted a retrospective study using observational data from the National COVID Cohort Collaborative (N3C), a longitudinal, multicenter U.S.
Importance: Understanding of SARS-CoV-2 infection in US children has been limited by the lack of large, multicenter studies with granular data.
Objective: To examine the characteristics, changes over time, outcomes, and severity risk factors of children with SARS-CoV-2 within the National COVID Cohort Collaborative (N3C).
Design, Setting, And Participants: A prospective cohort study of encounters with end dates before September 24, 2021, was conducted at 56 N3C facilities throughout the US.
Importance: SARS-CoV-2.
Objective: To determine the characteristics, changes over time, outcomes, and severity risk factors of SARS-CoV-2 affected children within the National COVID Cohort Collaborative (N3C).
Design: Prospective cohort study of patient encounters with end dates before May 27th, 2021.
Summary: For the analysis of high-throughput genomic data produced by next-generation sequencing (NGS) technologies, researchers need to identify linkage disequilibrium (LD) structure in the genome. In this work, we developed an R package gpart which provides clustering algorithms to define LD blocks or analysis units consisting of SNPs. The visualization tool in gpart can display the LD structure and gene positions for up to 20 000 SNPs in one image.
View Article and Find Full Text PDFPathway-based analysis in genome-wide association study (GWAS) is being widely used to uncover novel multi-genic functional associations. Many of these pathway-based methods have been used to test the enrichment of the associated genes in the pathways, but exhibited low powers and were highly affected by free parameters. We present the novel method and software GSA-SNP2 for pathway enrichment analysis of GWAS P-value data.
View Article and Find Full Text PDFMotivation: Linkage disequilibrium (LD) block construction is required for research in population genetics and genetic epidemiology, including specification of sets of single nucleotide polymorphisms (SNPs) for analysis of multi-SNP based association and identification of haplotype blocks in high density sequencing data. Existing methods based on a narrow sense definition do not allow intermediate regions of low LD between strongly associated SNP pairs and tend to split high density SNP data into small blocks having high between-block correlation.
Results: We present Big-LD, a block partition method based on interval graph modeling of LD bins which are clusters of strong pairwise LD SNPs, not necessarily physically consecutive.
Genomics Inform
December 2016
Many researchers have found that one of the most important characteristics of the structure of linkage disequilibrium is that the human genome can be divided into non-overlapping block partitions in which only a small number of haplotypes are observed. The location and distribution of haplotype blocks can be seen as a population property influenced by population genetic events such as selection, mutation, recombination and population structure. In this study, we investigate the effects of the density of markers relative to the full set of all polymorphisms in the region on the results of haplotype partitioning for five popular haplotype block partition methods: three methods in Haploview (confidence interval, four gamete test, and solid spine), MIG++ implemented in PLINK 1.
View Article and Find Full Text PDFBy jointly analyzing multiple variants within a gene, instead of one at a time, gene-based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive.
View Article and Find Full Text PDFGene-based analysis of multiple single nucleotide polymorphisms (SNPs) in a gene region is an alternative to single SNP analysis. The multi-bin linear combination test (MLC) proposed in previous studies utilizes the correlation among SNPs within a gene to construct a gene-based global test. SNPs are partitioned into clusters of highly correlated SNPs, and the MLC test statistic quadratically combines linear combination statistics constructed for each cluster.
View Article and Find Full Text PDFMulti-marker methods for genetic association analysis can be performed for common and low frequency SNPs to improve power. Regression models are an intuitive way to formulate multi-marker tests. In previous studies we evaluated regression-based multi-marker tests for common SNPs, and through identification of bins consisting of correlated SNPs, developed a multi-bin linear combination (MLC) test that is a compromise between a 1 df linear combination test and a multi-df global test.
View Article and Find Full Text PDFObesity (Silver Spring)
November 2013
Objective: Although genome-wide association studies (GWAS) have substantially contributed to understanding the genetic architecture, unidentified variants for complex traits remain an issue. One of the efficient approaches is the improvement of the power of GWAS scan by weighting P values with prior linkage signals. Our objective was to identify the novel candidates for obesity in Asian populations by using genemapping strategies that combine linkage and association analyses.
View Article and Find Full Text PDFThe estimated glomerular filtration rate is a well-known measure of renal function and is widely used to follow the course of disease. Although there have been several investigations establishing the genetic background contributing to renal function, Asian populations have rarely been used in these genome-wide studies. Here, we aimed to find candidate genetic determinants of renal function in 1007 individuals from 73 extended families of Mongolian origin.
View Article and Find Full Text PDF