J Am Med Inform Assoc
April 2024
Objective: To introduce 2 R-packages that facilitate conducting health economics research on OMOP-based data networks, aiming to standardize and improve the reproducibility, transparency, and transferability of health economic models.
Materials And Methods: We developed the software tools and demonstrated their utility by replicating a UK-based heart failure data analysis across 5 different international databases from Estonia, Spain, Serbia, and the United States.
Results: We examined treatment trajectories of 47 163 patients.
Objective: To describe the reusable transformation process of electronic health records (EHR), claims, and prescriptions data into Observational Medical Outcome Partnership (OMOP) Common Data Model (CDM), together with challenges faced and solutions implemented.
Materials And Methods: We used Estonian national health databases that store almost all residents' claims, prescriptions, and EHR records. To develop and demonstrate the transformation process of Estonian health data to OMOP CDM, we used a 10% random sample of the Estonian population ( = 150 824 patients) from 2012 to 2019 (MAITT dataset).
g:Profiler is a reliable and up-to-date functional enrichment analysis tool that supports various evidence types, identifier types and organisms. The toolset integrates many databases, including Gene Ontology, KEGG and TRANSFAC, to provide a comprehensive and in-depth analysis of gene lists. It also provides interactive and intuitive user interfaces and supports ordered queries and custom statistical backgrounds, among other settings.
View Article and Find Full Text PDFBackground: Ischemic stroke (IS) is a major health risk without generally usable effective measures of primary prevention. Early warning signals that are easy to detect and widely available can save lives. Estonia has one nation-wide Electronic Health Record (EHR) database for the storage of medical information of patients from hospitals and primary care providers.
View Article and Find Full Text PDFImportance: Large-scale data on type-specific human papillomavirus (HPV) prevalence and disease burden worldwide are needed to guide cervical cancer prevention efforts. Promoting the research and application of health care big data has become a key factor in modern medical research.
Objective: To examine the prevaccination prevalence of high-risk HPV (hrHPV) and type distribution by cervical cytology grade in Estonia.
Copy-number variations (CNV) are believed to play an important role in a wide range of complex traits, but discovering such associations remains challenging. While whole-genome sequencing (WGS) is the gold-standard approach for CNV detection, there are several orders of magnitude more samples with available genotyping microarray data. Such array data can be exploited for CNV detection using dedicated software (e.
View Article and Find Full Text PDFMany eukaryotic genes can give rise to different alternative transcripts depending on stage of development, cell type, and physiological cues. Current transcriptome-wide sequencing technologies highlight the remarkable extent of this regulation in metazoans and allow for RNA isoforms to be profiled in increasingly small biological samples and with a growing confidence. Understanding biological functions of sample-specific transcripts is a major challenge in genomics and RNA processing fields.
View Article and Find Full Text PDFObjective: To develop a framework for identifying temporal clinical event trajectories from Observational Medical Outcomes Partnership-formatted observational healthcare data.
Materials And Methods: A 4-step framework based on significant temporal event pair detection is described and implemented as an open-source R package. It is used on a population-based Estonian dataset to first replicate a large Danish population-based study and second, to conduct a disease trajectory detection study for type 2 diabetes patients in the Estonian and Dutch databases as an example.
Background: Protein microarray is a well-established approach for characterizing activity levels of thousands of proteins in a parallel manner. Analysis of protein microarray data is complex and time-consuming, while existing solutions are either outdated or challenging to use without programming skills. The typical data analysis pipeline consists of a data preprocessing step, followed by differential expression analysis, which is then put into context via functional enrichment.
View Article and Find Full Text PDFThe molecular basis of aging and of aging-associated diseases is being unraveled at an increasing pace. An extended healthspan, and not merely an extension of lifespan, has become the aim of medical practice. Here, we define health based on the absence of diseases and dysfunctions.
View Article and Find Full Text PDFAn amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFAlzheimer's disease and other types of dementia are the top cause for disabilities in later life and various types of experiments have been performed to understand the underlying mechanisms of the disease with the aim of coming up with potential drug targets. These experiments have been carried out by scientists working in different domains such as proteomics, molecular biology, clinical diagnostics and genomics. The results of such experiments are stored in the databases designed for collecting data of similar types.
View Article and Find Full Text PDFBiological data analysis often deals with lists of genes arising from various studies. The g:Profiler toolset is widely used for finding biological categories enriched in gene lists, conversions between gene identifiers and mappings to their orthologs. The mission of g:Profiler is to provide a reliable service based on up-to-date high quality data in a convenient manner across many evidence types, identifier spaces and organisms.
View Article and Find Full Text PDFThe Estonian Biobank, governed by the Institute of Genomics at the University of Tartu (Biobank), has stored genetic material/DNA and continuously collected data since 2002 on a total of 52,274 individuals representing ~5% of the Estonian adult population and is increasing. To explore the utility of data available in the Biobank, we conducted a phenome-wide association study (PheWAS) in two areas of interest to healthcare researchers; asthma and liver disease. We used 11 asthma and 13 liver disease-associated single nucleotide polymorphisms (SNPs), identified from published genome-wide association studies, to test our ability to detect established associations.
View Article and Find Full Text PDFAllele-specific analyses to understand frequency differences across populations, particularly populations not well studied, are important to help identify variants that may have a functional effect on disease mechanisms and phenotypic predisposition, facilitating new Genome-Wide Association Studies (GWAS). We aimed to compare the allele frequency of 11 asthma-associated and 16 liver disease-associated single nucleotide polymorphisms (SNPs) between the Estonian, HapMap and 1000 genome project populations. When comparing EGCUT with HapMap populations, the largest difference in allele frequencies was observed with the Maasai population in Kinyawa, Kenya, with 12 SNP variants reporting statistical significance.
View Article and Find Full Text PDFBackground: A widely applied approach to extract knowledge from high-throughput genomic data is clustering of gene expression profiles followed by functional enrichment analysis. This type of analysis, when done manually, is highly subjective and has limited reproducibility. Moreover, this pipeline can be very time-consuming and resource-demanding as enrichment analysis is done for tens to hundreds of clusters at a time.
View Article and Find Full Text PDFPharmacogenomics aims to tailor pharmacological treatment to each individual by considering associations between genetic polymorphisms and adverse drug effects (ADEs). With technological advances, pharmacogenomic research has evolved from candidate gene analyses to genome-wide association studies. Here, we integrate deep whole-genome sequencing (WGS) information with drug prescription and ADE data from Estonian electronic health record (EHR) databases to evaluate genome- and pharmacome-wide associations on an unprecedented scale.
View Article and Find Full Text PDFPurpose: Biomedical databases combining electronic medical records and phenotypic and genomic data constitute a powerful resource for the personalization of treatment. To leverage the wealth of information provided, algorithms are required that systematically translate the contained information into treatment recommendations based on existing genotype-phenotype associations.
Methods: We developed and tested algorithms for translation of preexisting genotype data of over 44,000 participants of the Estonian biobank into pharmacogenetic recommendations.
Background: Modern activity trackers, including the Fitbit Zip, enable the measurement of both the step count as well as physical activity (PA) intensities. However, there is a need for field-based validation studies in a variety of populations before using trackers for research. Therefore, the purpose of the current study was to investigate the validity of Fitbit Zip step count, moderate to vigorous physical activity (MVPA) and sedentary minutes, in different school segments in 3rd grade students.
View Article and Find Full Text PDFBackground: Neuropathological findings support an autoimmune etiology as an underlying factor for loss of orexin-producing neurons in spontaneous narcolepsy type 1 (narcolepsy with cataplexy; sNT1) as well as in Pandemrix influenza vaccine-induced narcolepsy type 1 (Pdmx-NT1). The precise molecular target or antigens for the immune response have, however, remained elusive.
Methods: Here we have performed a comprehensive antigenic repertoire analysis of sera using the next-generation phage display method - mimotope variation analysis (MVA).
Aim: To develop a web tool for survival analysis based on CpG methylation patterns.
Materials & Methods: We utilized methylome data from 'The Cancer Genome Atlas' and used the Cox proportional-hazards model to develop an interactive web interface for survival analysis.
Results: MethSurv enables survival analysis for a CpG located in or around the proximity of a query gene.
Background: Our main aim has been to design a framework to improve vancomycin dosing in neonates. This required the development and verification of a computerized dose adjustment application, DosOpt, to guide the selection.
Methods: Model fitting in DosOpt uses Bayesian methods for deriving individual pharmacokinetic (PK) estimates from population priors and patient therapeutic drug monitoring measurements.
High titer autoantibodies produced by B lymphocytes are clinically important features of many common autoimmune diseases. APECED patients with deficient autoimmune regulator (AIRE) gene collectively display a broad repertoire of high titer autoantibodies, including some which are pathognomonic for major autoimmune diseases. AIRE deficiency severely reduces thymic expression of gene-products ordinarily restricted to discrete peripheral tissues, and developing T cells reactive to those gene-products are not inactivated during their development.
View Article and Find Full Text PDF