We performed a hypothesis-generating phenome-wide association study (PheWAS) to identify and characterize cross-phenotype associations, where one SNP is associated with two or more phenotypes, between thousands of genetic variants assayed on the Metabochip and hundreds of phenotypes in 5,897 African Americans as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study. The PAGE I study was a National Human Genome Research Institute-funded collaboration of four study sites accessing diverse epidemiologic studies genotyped on the Metabochip, a custom genotyping chip that has dense coverage of regions in the genome previously associated with cardio-metabolic traits and outcomes in mostly European-descent populations. Here we focus on identifying novel phenome-genome relationships, where SNPs are associated with more than one phenotype.
View Article and Find Full Text PDFGenome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities.
View Article and Find Full Text PDFNeuroimaging pipelines are known to generate different results depending on the computing platform where they are compiled and executed. We quantify these differences for brain tissue classification, fMRI analysis, and cortical thickness (CT) extraction, using three of the main neuroimaging packages (FSL, Freesurfer and CIVET) and different versions of GNU/Linux. We also identify some causes of these differences using library and system call interception.
View Article and Find Full Text PDFBody fat distribution is a heritable trait and a well-established predictor of adverse metabolic outcomes, independent of overall adiposity. To increase our understanding of the genetic basis of body fat distribution and its molecular links to cardiometabolic traits, here we conduct genome-wide association meta-analyses of traits related to waist and hip circumferences in up to 224,459 individuals. We identify 49 loci (33 new) associated with waist-to-hip ratio adjusted for body mass index (BMI), and an additional 19 loci newly associated with related waist and hip circumference measures (P < 5 × 10(-8)).
View Article and Find Full Text PDFBackground: QT interval (QT) prolongation is an established risk factor for ventricular tachyarrhythmia and sudden cardiac death. Previous genome-wide association studies in populations of the European descent have identified multiple genetic loci that influence QT, but few have examined these loci in ethnically diverse populations.
Methods: Here, we examine the direction, magnitude, and precision of effect sizes for 21 previously reported SNPs from 12 QT loci, in populations of European (n = 16,398), African (n = 5,437), American Indian (n = 5,032), Hispanic (n = 1,143), and Asian (n = 932) descent as part of the Population Architecture using Genomics and Epidemiology (PAGE) study.
Background: Multiple primary cancers account for approximately 16% of all incident cancers in the United States. Although genome-wide association studies (GWAS) have identified many common genetic variants associated with various cancer sites, no study has examined the association of these genetic variants with risk of multiple primary cancers (MPC).
Methods: As part of the National Human Genome Research Institute (NHGRI) Population Architecture using Genomics and Epidemiology (PAGE) study, we used data from the Multiethnic Cohort (MEC) and Women's Health Initiative (WHI).
Background: Genome-wide association studies have identified hundreds of genetic variants associated with specific cancers. A few of these risk regions have been associated with more than one cancer site; however, a systematic evaluation of the associations between risk variants for other cancers and lung cancer risk has yet to be performed.
Methods: We included 18023 patients with lung cancer and 60543 control subjects from two consortia, Population Architecture using Genomics and Epidemiology (PAGE) and Transdisciplinary Research in Cancer of the Lung (TRICL).
Background: C-reactive protein (CRP) is a biomarker of inflammation. Genome-wide association studies (GWAS) have identified single-nucleotide polymorphisms (SNPs) associated with CRP concentrations and inflammation-related traits such as cardiovascular disease, type 2 diabetes mellitus, and obesity. We aimed to replicate previous CRP-SNP associations, assess whether these associations generalize to additional race/ethnicity groups, and evaluate inflammation-related SNPs for a potentially pleiotropic association with CRP.
View Article and Find Full Text PDFBackground: Risk of non-Hodgkin lymphoma (NHL) is higher among individuals with a family history or a prior diagnosis of other cancers. Genome-wide association studies (GWAS) have suggested that some genetic susceptibility variants are associated with multiple complex traits (pleiotropy).
Objective: We investigated whether common risk variants identified in cancer GWAS may also increase the risk of developing NHL as the first primary cancer.
Background: A number of genetic variants have been discovered by recent genome-wide association studies for their associations with clinical coronary heart disease (CHD). However, it is unclear whether these variants are also associated with the development of CHD as measured by subclinical atherosclerosis phenotypes, ankle brachial index (ABI), carotid artery intima-media thickness (cIMT) and carotid plaque.
Methods: Ten CHD risk single nucleotide polymorphisms (SNPs) were genotyped in individuals of European American (EA), African American (AA), American Indian (AI), and Mexican American (MA) ancestry in the Population Architecture using Genomics and Epidemiology (PAGE) study.
A loss-of-function mutation (Q141K, rs2231142) in the ATP-binding cassette, subfamily G, member 2 gene (ABCG2) has been shown to be associated with serum uric acid levels and gout in Asians, Europeans, and European and African Americans; however, less is known about these associations in other populations. Rs2231142 was genotyped in 22,734 European Americans, 9,720 African Americans, 3,849 Mexican Americans, and 3,550 American Indians in the Population Architecture using Genomics and Epidemiology (PAGE) Study (2008-2012). Rs2231142 was significantly associated with serum uric acid levels (P = 2.
View Article and Find Full Text PDFUsing a phenome-wide association study (PheWAS) approach, we comprehensively tested genetic variants for association with phenotypes available for 70,061 study participants in the Population Architecture using Genomics and Epidemiology (PAGE) network. Our aim was to better characterize the genetic architecture of complex traits and identify novel pleiotropic relationships. This PheWAS drew on five population-based studies representing four major racial/ethnic groups (European Americans (EA), African Americans (AA), Hispanics/Mexican-Americans, and Asian/Pacific Islanders) in PAGE, each site with measurements for multiple traits, associated laboratory measures, and intermediate biomarkers.
View Article and Find Full Text PDFPhilos Trans A Math Phys Eng Sci
January 2013
The current model of transferring data from data centres to desktops for analysis will soon be rendered impractical by the accelerating growth in the volume of science datasets. Processing will instead often take place on high-performance servers co-located with data. Evaluations of how new technologies such as cloud computing would support such a new distributed computing model are urgently needed.
View Article and Find Full Text PDFWe examined the association between HNF1B variants identified in a recent genome-wide association study and endometrial cancer in two large case-control studies nested in prospective cohorts: the Multiethnic Cohort Study (MEC) and the Women's Health Initiative (WHI) as part of the Population Architecture using Genomics and Epidemiology (PAGE) study. A total of 1,357 incident cases of invasive endometrial cancer and 7,609 controls were included in the analysis (MEC: 426 cases/3,854 controls; WHI: 931 cases/3,755 controls). The majority of women in the WHI were European American, while the MEC included sizable numbers of African Americans, Japanese and Latinos.
View Article and Find Full Text PDFBackground: Genome-wide association studies identified several single nucleotide polymorphisms (SNP) associated with prevalent coronary heart disease (CHD), but less is known of associations with incident CHD. The association of 13 published CHD SNPs was examined in 5 ancestry groups of 4 large US prospective cohorts.
Methods And Results: The analyses included incident coronary events over an average 9.
Summary: We have developed an RNA-Seq analysis workflow for single-ended Illumina reads, termed RseqFlow. This workflow includes a set of analytic functions, such as quality control for sequencing data, signal tracks of mapped reads, calculation of expression levels, identification of differentially expressed genes and coding SNPs calling. This workflow is formalized and managed by the Pegasus Workflow Management System, which maps the analysis modules onto available computational resources, automatically executes the steps in the appropriate order and supervises the whole running process.
View Article and Find Full Text PDFPhilos Trans A Math Phys Eng Sci
August 2011
This paper presents a case study of an approach to sustainable software architecture that has been successfully applied over a period of 10 years to astronomy software services at the NASA Infrared Processing and Analysis Center (IPAC), Caltech (http://www.ipac.caltech.
View Article and Find Full Text PDFThe advent of data-intensive science has sharpened our need for better communication within and between the fields of science and technology, to name a few. No one mind can encompass all that is necessary to be successful in controlling and analyzing the data deluge we are experiencing. Therefore, we must bring together diverse fields, communicate clearly, and build crossdisciplinary methods and tools to realize its potential.
View Article and Find Full Text PDFGenome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism.
View Article and Find Full Text PDFData analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multidimensional parameter space consisting of input performance parameters to the applications that are known to affect their execution times. While some performance parameters such as grouping of workflow components and their mapping to machines do not affect the accuracy of the analysis, others may dictate trading the output quality of individual components (and of the whole workflow) for performance.
View Article and Find Full Text PDFInt J High Perform Comput Appl
August 2009
Integrative biomedical research projects query, analyze, and integrate many different data types and make use of datasets obtained from measurements or simulations of structure and function at multiple biological scales. With the increasing availability of high-throughput and high-resolution instruments, the integrative biomedical research imposes many challenging requirements on software middleware systems. In this paper, we look at some of these requirements using example research pattern templates.
View Article and Find Full Text PDFProc Int Symp High Perform Distrib Comput
January 2009
Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multi-dimensional parameter space. While some performance parameters such as grouping of workflow components and their mapping to machines do not a ect the accuracy of the output, others may dictate trading the output quality of individual components (and of the whole workflow) for performance.
View Article and Find Full Text PDF