Publications by authors named "Jose C Montesinos-Lopez"

The popularity of genomic selection as an efficient and cost-effective approach to estimate breeding values continues to increase, due in part to the significant saving in phenotyping. Ridge regression is one of the most popular methods used for genomic prediction; however, its efficiency (in terms of prediction performance) depends on the appropriate tunning of the penalization parameter. In this paper we propose a novel, more efficient method to select the optimal penalization parameter for Ridge regression.

View Article and Find Full Text PDF

Genomic selection is revolutionizing both plant and animal breeding, with its practical application depending critically on high prediction accuracy. In this study, we aimed to enhance prediction accuracy by exploring the use of graph models within a linear mixed model framework. Our investigation revealed that incorporating the graph constructed with line connections alone resulted in decreased prediction accuracy compared to conventional methods that consider only genotype effects.

View Article and Find Full Text PDF
Article Synopsis
  • - Genomic selection (GS) is revolutionizing plant breeding by making phenotyping more efficient, but its effectiveness can be hindered by mismatches between training and testing datasets, affecting model accuracy.
  • - The study presents a new method using binary-Lasso regression to assign weights to features when training models, giving less weight to features that overly distinguish between training and testing sets.
  • - This weighting approach was tested on six datasets, showing significant improvements in prediction accuracy, with easy implementation through the glmnet library for practical use.
View Article and Find Full Text PDF

Genomic prediction models for quantitative traits assume continuous and normally distributed phenotypes. In this research, we proposed a novel Bayesian discrete lognormal regression model. Genomic selection is a powerful tool in modern breeding programs that uses genomic information to predict the performance of individuals and select those with desirable traits.

View Article and Find Full Text PDF

Genomic selection (GS) is a predictive methodology that is changing plant breeding. Genomic selection trains a statistical machine-learning model using available phenotypic and genotypic data with which predictions are performed for individuals that were only genotyped. For this reason, some statistical machine-learning methods are being implemented in GS, but in order to improve the selection of new genotypes early in the prediction process, the exploration of new statistical machine-learning algorithms must continue.

View Article and Find Full Text PDF

Genomic enabled prediction is playing a key role for the success of genomic selection (GS). However, according to the No Free Lunch Theorem, there is not a universal model that performs well for all data sets. Due to this, many statistical and machine learning models are available for genomic prediction.

View Article and Find Full Text PDF

The rapid spread of the new SARS-CoV-2 virus triggered a global health crisis, disproportionately impacting people with pre-existing health conditions and particular demographic and socioeconomic characteristics. One of the main concerns of governments has been to avoid health systems becoming overwhelmed. For this reason, they have implemented a series of non-pharmaceutical measures to control the spread of the virus, with mass tests being one of the most effective controls.

View Article and Find Full Text PDF

When multitrait data are available, the preferred models are those that are able to account for correlations between phenotypic traits because when the degree of correlation is moderate or large, this increases the genomic prediction accuracy. For this reason, in this article, we explore Bayesian multitrait kernel methods for genomic prediction and we illustrate the power of these models with three-real datasets. The kernels under study were the linear, Gaussian, polynomial, and sigmoid kernels; they were compared with the conventional Ridge regression and GBLUP multitrait models.

View Article and Find Full Text PDF

Genomic selection (GS) is revolutionizing conventional ways of developing new plants and animals. However, because it is a predictive methodology, GS strongly depends on statistical and machine learning to perform these predictions. For continuous outcomes, more models are available for GS.

View Article and Find Full Text PDF

In genomic selection choosing the statistical machine learning model is of paramount importance. In this paper, we present an application of a zero altered random forest model with two versions (ZAP_RF and ZAPC_RF) to deal with excess zeros in count response variables. The proposed model was compared with the conventional random forest (RF) model and with the conventional Generalized Poisson Ridge regression (GPR) using two real datasets, and we found that, in terms of prediction performance, the proposed zero inflated random forest model outperformed the conventional RF and GPR models.

View Article and Find Full Text PDF

The primary objective of this paper is to provide a guide on implementing Bayesian generalized kernel regression methods for genomic prediction in the statistical software R. Such methods are quite efficient for capturing complex non-linear patterns that conventional linear regression models cannot. Furthermore, these methods are also powerful for leveraging environmental covariates, such as genotype × environment (G×E) prediction, among others.

View Article and Find Full Text PDF

The paradigm called genomic selection (GS) is a revolutionary way of developing new plants and animals. This is a predictive methodology, since it uses learning methods to perform its task. Unfortunately, there is no universal model that can be used for all types of predictions; for this reason, specific methodologies are required for each type of output (response variables).

View Article and Find Full Text PDF

In this paper we propose a Bayesian multi-output regressor stacking (BMORS) model that is a generalization of the multi-trait regressor stacking method. The proposed BMORS model consists of two stages: in the first stage, a univariate genomic best linear unbiased prediction (GBLUP including genotype × environment interaction GE) model is implemented for each of the L traits under study; then the predictions of all traits are included as covariates in the second stage, by implementing a Ridge regression model. The main objectives of this research were to study alternative models to the existing multi-trait multi-environment (BMTME) model with respect to (1) genomic-enabled prediction accuracy, and (2) potential advantages in terms of computing resources and implementation.

View Article and Find Full Text PDF

In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions.

View Article and Find Full Text PDF

There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods.

View Article and Find Full Text PDF

When a plant scientist wishes to make genomic-enabled predictions of multiple traits measured in multiple individuals in multiple environments, the most common strategy for performing the analysis is to use a single trait at a time taking into account genotype × environment interaction (G × E), because there is a lack of comprehensive models that simultaneously take into account the correlated counting traits and G × E. For this reason, in this study we propose a multiple-trait and multiple-environment model for count data. The proposed model was developed under the Bayesian paradigm for which we developed a Markov Chain Monte Carlo (MCMC) with noninformative priors.

View Article and Find Full Text PDF