Publications by authors named "Luis Santana-Quintero"

Mouse (Mus musculus) models have been heavily utilized in developmental biology research to understand mammalian embryonic development, as mice share many genetic, physiological, and developmental characteristics with humans. New explorations into the integration of temporal (stage-specific) and transcriptional (tissue-specific) data have expanded our knowledge of mouse embryo tissue-specific gene functions. To better understand the substantial impact of synonymous mutational variations in the cell-state-specific transcriptome on a tissue's codon and codon pair usage landscape, we have established a novel resource-Mouse Embryo Codon and Codon Pair Usage Tables (Mouse Embryo CoCoPUTs).

View Article and Find Full Text PDF

Nucleic acid tests for blood donor screening have improved the safety of the blood supply; however, increasing numbers of emerging pathogen tests are burdensome. Multiplex testing platforms are a potential solution. The Blood Borne Pathogen Resequencing Microarray Expanded (BBP-RMAv.

View Article and Find Full Text PDF

Objectives: Anaphylaxis is a severe life-threatening allergic reaction, and its accurate identification in healthcare databases can harness the potential of "Big Data" for healthcare or public health purposes.

Materials And Methods: This study used claims data obtained between October 1, 2015 and February 28, 2019 from the CMS database to examine the utility of machine learning in identifying incident anaphylaxis cases. We created a feature selection pipeline to identify critical features between different datasets.

View Article and Find Full Text PDF
Article Synopsis
  • In 2020, Novartis and the FDA began a 4-year collaboration to explore radio-genomics for predicting factors in HR+/HER- metastatic breast cancer.
  • The partnership focuses on harnessing advanced analytics and AI to improve future scientific projects.
  • The tutorial offers guidelines for conducting multi-omics research, emphasizing communication, data practices, and outlining a four-step process: plan, design, develop, and disseminate.
View Article and Find Full Text PDF

Objective: Anaphylaxis is a severe life-threatening allergic reaction, and its accurate identification in healthcare databases can harness the potential of "Big Data" for healthcare or public health purposes.

Methods: This study used claims data obtained between October 1, 2015 and February 28, 2019 from the CMS database to examine the utility of machine learning in identifying incident anaphylaxis cases. We created a feature selection pipeline to identify critical features between different datasets.

View Article and Find Full Text PDF

Very little is known about disease transmission via the gut microbiome. We hypothesized that certain inflammatory features could be transmitted via the gut microbiome and tested this hypothesis using an animal model of inflammatory diseases. Twelve-week-old healthy C57 Bl/6 and Germ-Free (GF) female and male mice were fecal matter transplanted (FMT) under anaerobic conditions with TNF donors exhibiting spontaneous Rheumatoid Arthritis (RA) and Inflammatory Bowel Disease (IBD) or with conventional healthy mice control donors.

View Article and Find Full Text PDF

Background: Accurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment. We recently proposed NeuSomatic, the first deep convolutional neural network-based somatic mutation detection approach, and demonstrated performance advantages on in silico data.

Results: In this study, we use the first comprehensive and well-characterized somatic reference data sets from the SEQC2 consortium to investigate best practices for using a deep learning framework in cancer mutation detection.

View Article and Find Full Text PDF

Background: Gene expression is highly variable across tissues of multi-cellular organisms, influencing the codon usage of the tissue-specific transcriptome. Cancer disrupts the gene expression pattern of healthy tissue resulting in altered codon usage preferences. The topic of codon usage changes as they relate to codon demand, and tRNA supply in cancer is of growing interest.

View Article and Find Full Text PDF

Protein expression in multicellular organisms varies widely across tissues. Codon usage in the transcriptome of each tissue is derived from genomic codon usage and the relative expression level of each gene. We created a comprehensive computational resource that houses tissue-specific codon, codon-pair, and dinucleotide usage data for 51 Homo sapiens tissues (TissueCoCoPUTs: https://hive.

View Article and Find Full Text PDF

Most viruses are known to spontaneously generate defective viral genomes (DVG) due to errors during replication. These DVGs are subgenomic and contain deletions that render them unable to complete a full replication cycle in the absence of a co-infecting, non-defective helper virus. DVGs, especially of the copyback type, frequently observed with paramyxoviruses, have been recognized to be important triggers of the antiviral innate immune response.

View Article and Find Full Text PDF

Usage of sequential codon-pairs is non-random and unique to each species. Codon-pair bias is related to but clearly distinct from individual codon usage bias. Codon-pair bias is thought to affect translational fidelity and efficiency and is presumed to be under the selective pressure.

View Article and Find Full Text PDF

Whole genome sequencing of bacterial isolates has become a daily task in many laboratories, generating incredible amounts of data. However, data acquisition is not an end in itself; the goal is to acquire high-quality data useful for understanding genetic relationships. Having a method that could rapidly determine which of the many available run metrics are the most important indicators of overall run quality and having a way to monitor these during a given sequencing run would be extremely helpful to this effect.

View Article and Find Full Text PDF

Background: Due to the degeneracy of the genetic code, most amino acids can be encoded by multiple synonymous codons. Synonymous codons naturally occur with different frequencies in different organisms. The choice of codons may affect protein expression, structure, and function.

View Article and Find Full Text PDF

The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes.

View Article and Find Full Text PDF

Unlabelled: Due to the size of Next-Generation Sequencing data, the computational challenge of sequence alignment has been vast. Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines. High-performance Integrated Virtual Environment (HIVE), a cloud-based environment optimized for storage and analysis of extra-large data, presents an algorithmic solution: the HIVE-hexagon DNA sequence aligner.

View Article and Find Full Text PDF

Efficiency has become one of the main concerns in evolutionary multiobjective optimization during recent years. One of the possible alternatives to achieve a faster convergence is to use a relaxed form of Pareto dominance that allows us to regulate the granularity of the approximation of the Pareto front that we wish to achieve. One such relaxed forms of Pareto dominance that has become popular in the last few years is epsilon-dominance, which has been mainly used as an archiving strategy in some multiobjective evolutionary algorithms.

View Article and Find Full Text PDF