Treatment of non-small cell lung cancer is increasingly biomarker driven with multiple genomic alterations, including those in the epidermal growth factor receptor (EGFR) gene, that benefit from targeted therapies. We developed a set of algorithms to assess EGFR status and morphology using a real-world advanced lung adenocarcinoma cohort of 2099 patients with hematoxylin and eosin (H&E) images exhibiting high morphological diversity and low tumor content relative to public datasets. The best performing EGFR algorithm was attention-based and achieved an area under the curve (AUC) of 0.
View Article and Find Full Text PDFWe used enzyme-linked immunoassay methods to measure the prevalence and the levels of antibody responses to the nucleocapsid (N) protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and four seasonal human coronaviruses (HCoV-OC43, HCoV-HKU1, HCoV 229E, and HCoV-NL63) in a cohort of 115 convalescent plasma donors infected with SARS-CoV-2 (1-61 days after symptom onset) compared to antibody levels in 114 individuals with no evidence of a recent infection with SARS-CoV-2. In the humoral response to the four seasonal coronaviruses, only HCoV-HKU1- and HCoV-229E-assays showed slightly elevated antibody levels in the COVID group compared to the control group. While in the COVID-group the levels of SARS-CoV-2 antibodies correlated significantly with disease severity, no association was found in the levels of antibodies against the seasonal coronaviruses.
View Article and Find Full Text PDFLoss of endosymbiotic algae ("bleaching") under heat stress has become a major problem for reef-building corals worldwide. To identify genes that might be involved in triggering or executing bleaching, or in protecting corals from it, we used RNAseq to analyze gene-expression changes during heat stress in a coral relative, the sea anemone Aiptasia. We identified >500 genes that showed rapid and extensive up-regulation upon temperature increase.
View Article and Find Full Text PDFGenes involved in 3'-splice site recognition during mRNA splicing constitute an emerging class of oncogenes. SF3B1 is the most frequently mutated splicing factor in cancer, and SF3B1 mutants corrupt branchpoint recognition leading to usage of cryptic 3'-splice sites and subsequent aberrant junctions. For a comprehensive determination of alterations leading to this splicing pattern, we performed a pan-TCGA screening for SF3B1-specific aberrant acceptor usage.
View Article and Find Full Text PDFWe present Knowledge Engine for Genomics (KnowEnG), a free-to-use computational system for analysis of genomics data sets, designed to accelerate biomedical discovery. It includes tools for popular bioinformatics tasks such as gene prioritization, sample clustering, gene set analysis, and expression signature analysis. The system specializes in "knowledge-guided" data mining and machine learning algorithms, in which user-provided data are analyzed in light of prior information about genes, aggregated from numerous knowledge bases and encoded in a massive "Knowledge Network.
View Article and Find Full Text PDFIn cnidarian-Symbiodiniaceae symbioses, algal endosymbiont population control within the host is needed to sustain a symbiotic relationship. However, the molecular mechanisms that underlie such population control are unclear. Here we show that a cnidarian host uses nitrogen limitation as a primary mechanism to control endosymbiont populations.
View Article and Find Full Text PDFThe extent to which gene fusions function as drivers of cancer remains a critical open question. Current algorithms do not sufficiently identify false-positive fusions arising during library preparation, sequencing, and alignment. Here, we introduce Data-Enriched Efficient PrEcise STatistical fusion detection (DEEPEST), an algorithm that uses statistical modeling to minimize false-positives while increasing the sensitivity of fusion detection.
View Article and Find Full Text PDFImportance: Data sets linking comprehensive genomic profiling (CGP) to clinical outcomes may accelerate precision medicine.
Objective: To assess whether a database that combines EHR-derived clinical data with CGP can identify and extend associations in non-small cell lung cancer (NSCLC).
Design, Setting, And Participants: Clinical data from EHRs were linked with CGP results for 28 998 patients from 275 US oncology practices.
We present SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. It takes SeqOthello only 5 min and 19.1 GB memory to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer RNA-seq datasets.
View Article and Find Full Text PDFCurr Protoc Bioinformatics
December 2017
Next-generation sequencing has produced petabytes of data, but accessing and analyzing these data remain challenging. Traditionally, researchers investigating public datasets like The Cancer Genome Atlas (TCGA) would download the data to a high-performance cluster, which could take several weeks even with a highly optimized network connection. The National Cancer Institute (NCI) initiated the Cancer Genomics Cloud Pilots program to provide researchers with the resources to process data with cloud computational resources.
View Article and Find Full Text PDFThe Seven Bridges Cancer Genomics Cloud (CGC; www.cancergenomicscloud.org) enables researchers to rapidly access and collaborate on massive public cancer genomic datasets, including The Cancer Genome Atlas.
View Article and Find Full Text PDFUnlabelled: Mammalian lipopolysaccharide (LPS) binding proteins (LBPs) occur mainly in extracellular fluids and promote LPS delivery to specific host cell receptors. The function of LBPs has been studied principally in the context of host defense; the possible role of LBPs in nonpathogenic host-microbe interactions has not been well characterized. Using the Euprymna scolopes-Vibrio fischeri model, we analyzed the structure and function of an LBP family protein, E.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
September 2015
The most diverse marine ecosystems, coral reefs, depend upon a functional symbiosis between a cnidarian animal host (the coral) and intracellular photosynthetic dinoflagellate algae. The molecular and cellular mechanisms underlying this endosymbiosis are not well understood, in part because of the difficulties of experimental work with corals. The small sea anemone Aiptasia provides a tractable laboratory model for investigating these mechanisms.
View Article and Find Full Text PDFFront Plant Sci
August 2014
Dahlia variabilis, with an exceptionally high diversity of floral forms and colors, is a popular flower amongst both commercial growers and hobbyists. Recently, some genetic controls of pigment patterns have been elucidated. These studies have been limited, however, by the lack of comprehensive transcriptomic resources for this species.
View Article and Find Full Text PDFCoral reefs provide habitats for a disproportionate number of marine species relative to the small area of the oceans that they occupy. The mutualism between the cnidarian animal hosts and their intracellular dinoflagellate symbionts provides the nutritional foundation for coral growth and formation of reef structures, because algal photosynthesis can provide >90% of the total energy of the host. Disruption of this symbiosis ("coral bleaching") is occurring on a large scale due primarily to anthropogenic factors and poses a major threat to the future of coral reefs.
View Article and Find Full Text PDFBackground: Coral reefs are hotspots of oceanic biodiversity, forming the foundation of ecosystems that are important both ecologically and for their direct practical impacts on humans. Corals are declining globally due to a number of stressors, including rising sea-surface temperatures and pollution; such stresses can lead to a breakdown of the essential symbiotic relationship between the coral host and its endosymbiotic dinoflagellates, a process known as coral bleaching. Although the environmental stresses causing this breakdown are largely known, the cellular mechanisms of symbiosis establishment, maintenance, and breakdown are still largely obscure.
View Article and Find Full Text PDFMotivation: Ultra-high-throughput sequencing produces duplicate and near-duplicate reads, which can consume computational resources in downstream applications. A tool that collapses such reads should reduce storage and assembly complications and costs.
Results: We developed Fulcrum to collapse identical and near-identical Illumina and 454 reads (such as those from PCR clones) into single error-corrected sequences; it can process paired-end as well as single-end reads.