Publications by authors named "Keith A Boroevich"

Tabular data analysis is a critical task in various domains, enabling us to uncover valuable insights from structured datasets. While traditional machine learning methods can be used for feature engineering and dimensionality reduction, they often struggle to capture the intricate relationships and dependencies within real-world datasets. In this paper, we present Multi-representation DeepInsight (MRep-DeepInsight), a novel extension of the DeepInsight method designed to enhance the analysis of tabular data.

View Article and Find Full Text PDF

The field of omics, driven by advances in high-throughput sequencing, faces a data explosion. This abundance of data offers unprecedented opportunities for predictive modeling in precision medicine, but also presents formidable challenges in data analysis and interpretation. Traditional machine learning (ML) techniques have been partly successful in generating predictive models for omics analysis but exhibit limitations in handling potential relationships within the data for more accurate prediction.

View Article and Find Full Text PDF

Annotation of cell-types is a critical step in the analysis of single-cell RNA sequencing (scRNA-seq) data that allows the study of heterogeneity across multiple cell populations. Currently, this is most commonly done using unsupervised clustering algorithms, which project single-cell expression data into a lower dimensional space and then cluster cells based on their distances from each other. However, as these methods do not use reference datasets, they can only achieve a rough classification of cell-types, and it is difficult to improve the recognition accuracy further.

View Article and Find Full Text PDF

Accumulating evidence indicates that long intergenic non-coding RNAs (lincRNAs) show more tissue-specific expression patterns than protein-coding genes (PCGs). However, although lincRNAs are subject to canonical transcriptional regulation like PCGs, the molecular basis for the specificity of their expression patterns remains unclear. Here, using expression data and coordinates of topologically associating domains (TADs) in human tissues, we show that lincRNA loci are significantly enriched in the more internal region of TADs compared to PCGs and that lincRNAs within TADs have higher tissue specificity than those outside TADs.

View Article and Find Full Text PDF

Modern oncology offers a wide range of treatments and therefore choosing the best option for particular patient is very important for optimal outcome. Multi-omics profiling in combination with AI-based predictive models have great potential for streamlining these treatment decisions. However, these encouraging developments continue to be hampered by very high dimensionality of the datasets in combination with insufficiently large numbers of annotated samples.

View Article and Find Full Text PDF

Background: Immune status in the tumor microenvironment is an important determinant of cancer progression and patient prognosis. Although a higher immune activity is often associated with a better prognosis, this trend is not absolute and differs across cancer types. We aimed to give insights into why some cancers do not show better survival despite higher immunity by assessing the relationship between different biological factors, including cytotoxicity, and patient prognosis in various cancer types using RNA-seq data collected by The Cancer Genome Atlas.

View Article and Find Full Text PDF
Article Synopsis
  • * A specific cancer subtype with immune evasion is linked to poor survival rates due to a lack of highly expressed neoantigens and high chromosomal instability, which contribute to immune resistance.
  • * The study suggests that analyzing the tumor microenvironment and neoantigen makeup could serve as valuable prognostic tools for treatment decisions in advanced colorectal cancer.
View Article and Find Full Text PDF

Artificial intelligence methods offer exciting new capabilities for the discovery of biological mechanisms from raw data because they are able to detect vastly more complex patterns of association that cannot be captured by classical statistical tests. Among these methods, deep neural networks are currently among the most advanced approaches and, in particular, convolutional neural networks (CNNs) have been shown to perform excellently for a variety of difficult tasks. Despite that applications of this type of networks to high-dimensional omics data and, most importantly, meaningful interpretation of the results returned from such models in a biomedical context remains an open problem.

View Article and Find Full Text PDF

Background: Mild cognitive impairment (MCI) is a precursor to Alzheimer's disease (AD), but not all MCI patients develop AD. Biomarkers for early detection of individuals at high risk for MCI-to-AD conversion are urgently required.

Methods: We used blood-based microRNA expression profiles and genomic data of 197 Japanese MCI patients to construct a prognosis prediction model based on a Cox proportional hazard model.

View Article and Find Full Text PDF

Aims: Monogenic diabetes is clinically heterogeneous and differs from common forms of diabetes (type 1 and 2). We aimed to investigate the clinical usefulness of a comprehensive genetic testing system, comprised of targeted next-generation sequencing (NGS) with phenotype-driven bioinformatics analysis in patients with monogenic diabetes, which uses patient genotypic and phenotypic data to prioritize potentially causal variants.

Methods: We performed targeted NGS of 383 genes associated with monogenic diabetes or common forms of diabetes in 13 Japanese patients with suspected (n = 10) or previously diagnosed (n = 3) monogenic diabetes or severe insulin resistance.

View Article and Find Full Text PDF

The discovery of drivers of cancer has traditionally focused on protein-coding genes. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods.

View Article and Find Full Text PDF
Article Synopsis
  • Metastasis is a leading cause of cancer death, making it crucial to understand how it develops and where metastatic tumor cells come from.
  • While it's often thought that metastatic tumors arise from a single cell of the primary tumor, new research suggests they may actually come from multiple cells, evidenced by data from mouse studies and human whole-exome sequencing.
  • A new method was created to quantify the number of "founder cells" in metastasis, and when applied to colorectal cancer patients, it showed that the origin of metastatic tumors varied widely, with 3 to 17 founder cells per tumor, indicating significant genetic differences that might affect treatment responses.
View Article and Find Full Text PDF

Background: Dementia with Lewy bodies (DLB) is the second most common subtype of neurodegenerative dementia in humans following Alzheimer's disease (AD). Present clinical diagnosis of DLB has high specificity and low sensitivity and finding potential biomarkers of prodromal DLB is still challenging. MicroRNAs (miRNAs) have recently received a lot of attention as a source of novel biomarkers.

View Article and Find Full Text PDF

It is critical, but difficult, to catch the small variation in genomic or other kinds of data that differentiates phenotypes or categories. A plethora of data is available, but the information from its genes or elements is spread over arbitrarily, making it challenging to extract relevant details for identification. However, an arrangement of similar genes into clusters makes these differences more accessible and allows for robust identification of hidden mechanisms (e.

View Article and Find Full Text PDF

Alzheimer's disease (AD) is the most common subtype of dementia, followed by Vascular Dementia (VaD), and Dementia with Lewy Bodies (DLB). Recently, microRNAs (miRNAs) have received a lot of attention as the novel biomarkers for dementia. Here, using serum miRNA expression of 1,601 Japanese individuals, we investigated potential miRNA biomarkers and constructed risk prediction models, based on a supervised principal component analysis (PCA) logistic regression method, according to the subtype of dementia.

View Article and Find Full Text PDF

Recent trends in drug development have been marked by diminishing returns caused by the escalating costs and falling rates of new drug approval. Unacceptable drug toxicity is a substantial cause of drug failure during clinical trials and the leading cause of drug withdraws after release to the market. Computational methods capable of predicting these failures can reduce the waste of resources and time devoted to the investigation of compounds that ultimately fail.

View Article and Find Full Text PDF

Alzheimer's disease (AD) is a common neurological disease that causes dementia in humans. Although the reports of associated pathological genes have been increasing, the molecular mechanism leading to the accumulation of amyloid-β (Aβ) in human brain is still not well understood. To identify novel genes that cause accumulation of Aβ in AD patients, we conducted an integrative analysis by combining a human genetic association study and transcriptome analysis in mouse brain.

View Article and Find Full Text PDF

Microcephaly-capillary malformation syndrome is a congenital and neurodevelopmental disorder caused by biallelic mutations in the STAMBP gene. Here we identify the novel homozygous mutation located in the SH3 binding motif of STAMBP (NM_006463.4) (c.

View Article and Find Full Text PDF

Genome-wide association studies (GWAS) suggest that the genetic architecture of complex diseases consists of unexpectedly numerous variants with small effect sizes. However, the polygenic architectures of many diseases have not been well characterized due to lack of simple and fast methods for unbiased estimation of the underlying proportion of disease-associated variants and their effect-size distribution. Applying empirical Bayes estimation of semi-parametric hierarchical mixture models to GWAS summary statistics, we confirmed that schizophrenia was extremely polygenic [~40% of independent genome-wide SNPs are risk variants, most within odds ratio (OR = 1.

View Article and Find Full Text PDF

Insertions and deletions (indels) have been implicated in dozens of human diseases through the radical alteration of gene function by short frameshift indels as well as long indels. However, the accurate detection of these indels from next-generation sequencing data is still challenging. This is particularly true for intermediate-size indels (≥50 bp), due to the short DNA sequencing reads.

View Article and Find Full Text PDF

Intellectual disability (ID) is one of neurodevelopmental disorders characterized by serious defects in both intelligence and adaptive behavior. Although it has been suggested that genetic aberrations associated with the process of cell division underlie ID, the cytological evidence for mitotic defects in actual patient's cells is rarely reported. Here, we report a novel mutation in the STARD9 (also known as KIF16A) gene found in a patient with severe ID, characteristic features, epilepsy, acquired microcephaly, and blindness.

View Article and Find Full Text PDF

The insulin receptor () gene was analyzed in four patients with severe insulin resistance, revealing five novel mutations and a deletion that removed exon 2. A patient with Donohue syndrome (DS) had a novel p.V657F mutation in the second fibronectin type III domain (FnIII-2), which contains the α-β cleavage site and part of the insulin-binding site.

View Article and Find Full Text PDF

Background: Refinement of candidate gene lists to select the most promising candidates for further experimental verification remains an essential step between high-throughput exploratory analysis and the discovery of specific causal genes. Given the qualitative and semantic complexity of biological data, successfully addressing this challenge requires development of flexible and interoperable solutions for making the best possible use of the largest possible fraction of all available data.

Results: We have developed an easily accessible framework that links two established network-based gene prioritization approaches with a supporting isolation forest-based integrative ranking method.

View Article and Find Full Text PDF