Publications by Maulik Shukla | LitMetric

Publications by authors named "Maulik Shukla"

Page 1 of 3

Comparative Genomic Analysis of Bacterial Data in BV-BRC: An Example Exploring Antimicrobial Resistance.

Alice R Wattam Nicole Bowers Thomas Brettin Neal Conrad Clark Cucinell Maulik Shukla

Methods Mol Biol

May 2024

As genomic and related data continue to expand, research biologists are often hampered by the computational hurdles required to analyze their data. The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Centers (BRC) to assist researchers with their analysis of genome sequence and other omics-related data. Recently, the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD), and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs merged to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) at https://www.

View Article and Find Full Text PDF

A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening.

Priyanka Vasanthakumari Yitan Zhu Thomas Brettin Alexander Partin Maulik Shukla

Cancers (Basel)

January 2024

It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models.

View Article and Find Full Text PDF

Integration of Computational Docking into Anti-Cancer Drug Response Prediction Models.

Oleksandr Narykov Yitan Zhu Thomas Brettin Yvonne A Evrard Alexander Partin Maulik Shukla

Cancers (Basel)

December 2023

Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions.

View Article and Find Full Text PDF

US National Institutes of Health Prioritization of SARS-CoV-2 Variants.

Sam Turner Arghavan Alisoltani Debbie Bratt Liel Cohen-Lavi Bethany L Dearlove Maulik Shukla

Emerg Infect Dis

May 2023

Article Synopsis

Since late 2020, new SARS-CoV-2 variants have frequently appeared, showing differences that may help them evade immunity from past infections.
The Early Detection group within the NIH's SARS-CoV-2 program utilizes bioinformatics to track these variants' emergence, spread, and traits, highlighting important ones for further study.
Since April 2021, this group has successfully prioritized variants each month, assisting NIH researchers by providing timely data on SARS-CoV-2 evolution for guiding experiments.

View Article and Find Full Text PDF

Data augmentation and multimodal learning for predicting drug response in patient-derived xenografts from gene expressions and histology images.

Alexander Partin Thomas Brettin Yitan Zhu James M Dolezal Sara Kochanny Maulik Shukla

Front Med (Lausanne)

March 2023

Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies. A primary challenge in modeling drug response prediction (DRP) with PDXs and neural networks (NNs) is the limited number of drug response samples. We investigate multimodal neural network (MM-Net) and data augmentation for DRP in PDXs.

View Article and Find Full Text PDF

TULIP: An RNA-seq-based Primary Tumor Type Prediction Tool Using Convolutional Neural Networks.

Sara Jones Matthew Beyers Maulik Shukla Fangfang Xia Thomas Brettin

Cancer Inform

December 2022

Background: With cancer as one of the leading causes of death worldwide, accurate primary tumor type prediction is critical in identifying genetic factors that can inhibit or slow tumor progression. There have been efforts to categorize primary tumor types with gene expression data using machine learning, and more recently with deep learning, in the last several years.

Methods: In this paper, we developed four 1-dimensional (1D) Convolutional Neural Network (CNN) models to classify RNA-seq count data as one of 17 highly represented primary tumor types or 32 primary tumor types regardless of imbalanced representation.

View Article and Find Full Text PDF

GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics.

Maxim Zvyagin Alexander Brace Kyle Hippe Yuntian Deng Bin Zhang Maulik Shukla

bioRxiv

November 2022

We seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pre-training on over 110 million prokaryotic gene sequences and fine-tuning a SARS-CoV-2-specific model on 1.

View Article and Find Full Text PDF

Early detection of emerging SARS-CoV-2 variants of interest for experimental evaluation.

Zachary S Wallace James Davis Anna Maria Niewiadomska Robert D Olson Maulik Shukla

Front Bioinform

October 2022

Since the beginning of the COVID-19 pandemic, SARS-CoV-2 has demonstrated its ability to rapidly and continuously evolve, leading to the emergence of thousands of different sequence variants, many with distinctive phenotypic properties. Fortunately, the broad application of next generation sequencing (NGS) across the globe has produced a wealth of SARS-CoV-2 genome sequences, offering a comprehensive picture of how this virus is evolving so that accurate diagnostics, reliable therapeutics, and prophylactic vaccines against COVID-19 can be developed and maintained. The millions of SARS-CoV-2 sequences deposited into genomic sequencing databases, including GenBank, BV-BRC, and GISAID, are annotated with the dates and geographic locations of sample collection, and can be aligned to and compared with the Wuhan-Hu-1 reference genome to extract their constellation of nucleotide and amino acid substitutions.

View Article and Find Full Text PDF

Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR.

Robert D Olson Rida Assaf Thomas Brettin Neal Conrad Clark Cucinell Maulik Shukla

Nucleic Acids Res

January 2023

The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Center (BRC) program to assist researchers with analyzing the growing body of genome sequence and other omics-related data. In this report, we describe the merger of the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) https://www.bv-brc.

View Article and Find Full Text PDF

Defining the risk of SARS-CoV-2 variants on immune protection.

Marciela M DeGrace Elodie Ghedin Matthew B Frieman Florian Krammer Alba Grifoni Maulik Shukla

Nature

May 2022

Article Synopsis

The emergence of new SARS-CoV-2 variants threatens the effectiveness of immunity from previous infections or vaccinations.
To tackle this issue, the NIH launched the SARS-CoV-2 Assessment of Viral Evolution (SAVE) program for real-time assessment of variant risks that might impact transmission and vaccine efficacy.
The program focuses on gathering and analyzing data on emerging variants and their effects on immunity, using animal models, while also addressing future challenges in monitoring rapidly evolving viruses.

View Article and Find Full Text PDF

Analysis of the ARTIC Version 3 and Version 4 SARS-CoV-2 Primers and Their Impact on the Detection of the G142D Amino Acid Substitution in the Spike Protein.

James J Davis S Wesley Long Paul A Christensen Randall J Olsen Robert Olson Maulik Shukla

Microbiol Spectr

December 2021

The ARTIC Network provides a common resource of PCR primer sequences and recommendations for amplifying SARS-CoV-2 genomes. The initial tiling strategy was developed with the reference genome Wuhan-01, and subsequent iterations have addressed areas of low amplification and sequence drop out. Recently, a new version (V4) was released, based on new variant genome sequences, in response to the realization that some V3 primers were located in regions with key mutations.

View Article and Find Full Text PDF

A cross-study analysis of drug response prediction in cancer cell lines.

Fangfang Xia Jonathan Allen Prasanna Balaprakash Thomas Brettin Cristina Garcia-Cardona Maulik Shukla

Brief Bioinform

January 2022

To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets.

View Article and Find Full Text PDF

A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes.

Margo VanOeffelen Marcus Nguyen Derya Aytan-Aktug Thomas Brettin Emily M Dietrich Maulik Shukla

Brief Bioinform

November 2021

Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data.

View Article and Find Full Text PDF

Publisher Correction: Converting tabular data into images for deep learning with convolutional neural networks.

Yitan Zhu Thomas Brettin Fangfang Xia Alexander Partin Maulik Shukla

Sci Rep

July 2021

View Article and Find Full Text PDF

Converting tabular data into images for deep learning with convolutional neural networks.

Yitan Zhu Thomas Brettin Fangfang Xia Alexander Partin Maulik Shukla

Sci Rep

May 2021

Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features, such as speech and imaging. However, most tabular data do not assume a spatial relationship between features, and thus are unsuitable for modeling using CNNs. To meet this challenge, we develop a novel algorithm, image generator for tabular data (IGTD), to transform tabular data into images by assigning features to pixel positions so that similar features are close to each other in the image.

View Article and Find Full Text PDF

Learning curves for drug response prediction in cancer cell lines.

Alexander Partin Thomas Brettin Yvonne A Evrard Yitan Zhu Hyunseung Yoo Maulik Shukla

BMC Bioinformatics

May 2021

Background: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data.

Methods: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets.

View Article and Find Full Text PDF

Molecular Architecture of Early Dissemination and Massive Second Wave of the SARS-CoV-2 Virus in a Major Metropolitan Area.

S Wesley Long Randall J Olsen Paul A Christensen David W Bernard James J Davis Maulik Shukla

mBio

October 2020

We sequenced the genomes of 5,085 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains causing two coronavirus disease 2019 (COVID-19) disease waves in metropolitan Houston, TX, an ethnically diverse region with 7 million residents. The genomes were from viruses recovered in the earliest recognized phase of the pandemic in Houston and from viruses recovered in an ongoing massive second wave of infections. The virus was originally introduced into Houston many times independently.

View Article and Find Full Text PDF

Ensemble transfer learning for the prediction of anti-cancer drug response.

Yitan Zhu Thomas Brettin Yvonne A Evrard Alexander Partin Fangfang Xia Maulik Shukla

Sci Rep

October 2020

Transfer learning, which transfers patterns learned on a source dataset to a related target dataset for constructing prediction models, has been shown effective in many applications. In this paper, we investigate whether transfer learning can be used to improve the performance of anti-cancer drug response prediction models. Previous transfer learning studies for drug response prediction focused on building models to predict the response of tumor cells to a specific drug treatment.

View Article and Find Full Text PDF

Predicting antimicrobial resistance using conserved genes.

Marcus Nguyen Robert Olson Maulik Shukla Margo VanOeffelen James J Davis

PLoS Comput Biol

October 2020

A growing number of studies are using machine learning models to accurately predict antimicrobial resistance (AMR) phenotypes from bacterial sequence data. Although these studies are showing promise, the models are typically trained using features derived from comprehensive sets of AMR genes or whole genome sequences and may not be suitable for use when genomes are incomplete. In this study, we explore the possibility of predicting AMR phenotypes using incomplete genome sequence data.

View Article and Find Full Text PDF

Molecular Architecture of Early Dissemination and Massive Second Wave of the SARS-CoV-2 Virus in a Major Metropolitan Area.

S Wesley Long Randall J Olsen Paul A Christensen David W Bernard James J Davis Maulik Shukla

medRxiv

September 2020

We sequenced the genomes of 5,085 SARS-CoV-2 strains causing two COVID-19 disease waves in metropolitan Houston, Texas, an ethnically diverse region with seven million residents. The genomes were from viruses recovered in the earliest recognized phase of the pandemic in Houston, and an ongoing massive second wave of infections. The virus was originally introduced into Houston many times independently.

View Article and Find Full Text PDF

Enhanced Co-Expression Extrapolation (COXEN) Gene Selection Method for Building Anti-Cancer Drug Response Prediction Models.

Yitan Zhu Thomas Brettin Yvonne A Evrard Fangfang Xia Alexander Partin Maulik Shukla

Genes (Basel)

September 2020

The co-expression extrapolation (COXEN) method has been successfully used in multiple studies to select genes for predicting the response of tumor cells to a specific drug treatment. Here, we enhance the COXEN method to select genes that are predictive of the efficacies of multiple drugs for building general drug response prediction models that are not specific to a particular drug. The enhanced COXEN method first ranks the genes according to their prediction power for each individual drug and then takes a union of top predictive genes of all the drugs, among which the algorithm further selects genes whose co-expression patterns are well preserved between cancer cases for building prediction models.

View Article and Find Full Text PDF

Genomic Analysis of Metastatic Solid Tumors in Veterans: Findings From the VHA National Precision Oncology Program.

Pradeep J Poonnen Jill E Duffy Bradley Hintze Maulik Shukla Thomas S Brettin

JCO Precis Oncol

August 2019

Purpose: The Veterans Health Administration (VHA) is the largest cancer care provider in the United States, with the added challenge of serving more than twice the percentage of patients with cancer in rural areas than the national average. The VHA established the National Precision Oncology Program in 2016 to implement and standardize the practice of precision oncology across the diverse VHA system.

Methods: Tumor or peripheral blood specimens from veterans with advanced solid tumors who were eligible for treatment were submitted for next-generation sequencing (NGS) at two commercial laboratories.

View Article and Find Full Text PDF

A synthesis of bacterial and archaeal phenotypic trait data.

Joshua S Madin Daniel A Nielsen Maria Brbic Ross Corkrey David Danko Maulik Shukla

Sci Data

June 2020

A synthesis of phenotypic and quantitative genomic traits is provided for bacteria and archaea, in the form of a scripted, reproducible workflow that standardizes and merges 26 sources. The resulting unified dataset covers 14 phenotypic traits, 5 quantitative genomic traits, and 4 environmental characteristics for approximately 170,000 strain-level and 15,000 species-aggregated records. It spans all habitats including soils, marine and fresh waters and sediments, host-associated and thermal.

View Article and Find Full Text PDF

The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities.

James J Davis Alice R Wattam Ramy K Aziz Thomas Brettin Ralph Butler Maulik Shukla

Nucleic Acids Res

January 2020

The PathoSystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center funded by the National Institute of Allergy and Infectious Diseases (https://www.patricbrc.org).

View Article and Find Full Text PDF

Predicting tumor cell line response to drug pairs with deep learning.

Fangfang Xia Maulik Shukla Thomas Brettin Cristina Garcia-Cardona Judith Cohn

BMC Bioinformatics

December 2018

Background: The National Cancer Institute drug pair screening effort against 60 well-characterized human tumor cell lines (NCI-60) presents an unprecedented resource for modeling combinational drug activity.

Results: We present a computational model for predicting cell line response to a subset of drug pairs in the NCI-ALMANAC database. Based on residual neural networks for encoding features as well as predicting tumor growth, our model explains 94% of the response variance.

View Article and Find Full Text PDF