Publications by Josep Arus-Pous

Publications by authors named "Josep Arus-Pous"

Page 1 of 1

SMILES-based deep generative scaffold decorator for de-novo drug design.

Josep Arús-Pous Atanas Patronov Esben Jannik Bjerrum Christian Tyrchan Jean-Louis Reymond

J Cheminform

May 2020

Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e.

View Article and Find Full Text PDF

REINVENT 2.0: An AI Tool for De Novo Drug Design.

Thomas Blaschke Josep Arús-Pous Hongming Chen Christian Margreitter Christian Tyrchan

J Chem Inf Model

December 2020

In the past few years, we have witnessed a renaissance of the field of molecular de novo drug design. The advancements in deep learning and artificial intelligence (AI) have triggered an avalanche of ideas on how to translate such techniques to a variety of domains including the field of drug design. A range of architectures have been devised to find the optimal way of generating chemical compounds by using either graph- or string (SMILES)-based representations.

View Article and Find Full Text PDF

A Potent and Selective Janus Kinase Inhibitor with a Chiral 3D-Shaped Triquinazine Ring System from Chemical Space.

Kris Meier Josep Arús-Pous Jean-Louis Reymond

Angew Chem Int Ed Engl

January 2021

The generated databases (GDBs) enumerate billions of possible molecules following simple rules of chemical stability and synthetic feasibility. Exploring the GDBs shows that many chiral, 3D-shaped ring systems, often containing quaternary centers, have never been exploited for drug design. Shown herein is that such ring systems can be useful for medicinal chemistry by using the example of the enantioselective synthesis of triquinazine, a novel chiral piperazine analogue derived from angular triquinane.

View Article and Find Full Text PDF

The Generated Databases (GDBs) as a Source of 3D-shaped Building Blocks for Use in Medicinal Chemistry and Drug Discovery.

Kris Meier Sven Bühlmann Josep Arús-Pous Jean-Louis Reymond

Chimia (Aarau)

April 2020

Drug discovery is in constant need of new molecules to develop drugs addressing unmet medical needs. To assess the chemical space available for drug design, our group investigates the generated databases (GDBs) listing all possible organic molecules up to a defined size, the largest of which is GDB-17 featuring 166.4 billion molecules up to 17 non-hydrogen atoms.

View Article and Find Full Text PDF

Exploring Chemical Space with Machine Learning.

Josep Arús-Pous Mahendra Awale Daniel Probst Jean-Louis Reymond

Chimia (Aarau)

December 2019

Chemical space is a concept to organize molecular diversity by postulating that different molecules occupy different regions of a mathematical space where the position of each molecule is defined by its properties. Our aim is to develop methods to explicitly explore chemical space in the area of drug discovery. Here we review our implementations of machine learning in this project, including our use of deep neural networks to enumerate the GDB13 database from a small sample set, to generate analogs of drugs and natural products after training with fragment-size molecules, and to predict the polypharmacology of molecules after training with known bioactive compounds from ChEMBL.

View Article and Find Full Text PDF

A de novo molecular generation method using latent vector based generative adversarial network.

Oleksii Prykhodko Simon Viet Johansson Panagiotis-Christos Kotsias Josep Arús-Pous Esben Jannik Bjerrum

J Cheminform

December 2019

Deep learning methods applied to drug discovery have been used to generate novel structures. In this study, we propose a new deep learning architecture, LatentGAN, which combines an autoencoder and a generative adversarial neural network for de novo molecular design. We applied the method in two scenarios: one to generate random drug-like compounds and another to generate target-biased compounds.

View Article and Find Full Text PDF

Applications of Deep-Learning in Exploiting Large-Scale and Heterogeneous Compound Data in Industrial Pharmaceutical Research.

Laurianne David Josep Arús-Pous Johan Karlsson Ola Engkvist Esben Jannik Bjerrum

Front Pharmacol

November 2019

In recent years, the development of high-throughput screening (HTS) technologies and their establishment in an industrialized environment have given scientists the possibility to test millions of molecules and profile them against a multitude of biological targets in a short period of time, generating data in a much faster pace and with a higher quality than before. Besides the structure activity data from traditional bioassays, more complex assays such as transcriptomics profiling or imaging have also been established as routine profiling experiments thanks to the advancement of Next Generation Sequencing or automated microscopy technologies. In industrial pharmaceutical research, these technologies are typically established in conjunction with automated platforms in order to enable efficient handling of screening collections of thousands to millions of compounds.

View Article and Find Full Text PDF

Randomized SMILES strings improve the quality of molecular generative models.

Josep Arús-Pous Simon Viet Johansson Oleksii Prykhodko Esben Jannik Bjerrum Christian Tyrchan

J Cheminform

November 2019

Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. Herein we perform an extensive benchmark on models trained with subsets of GDB-13 of different sizes (1 million, 10,000 and 1000), with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations. To guide the benchmarks new metrics were developed that define how well a model has generalized the training set.

View Article and Find Full Text PDF

Exploring the GDB-13 chemical space using deep generative models.

Josep Arús-Pous Thomas Blaschke Silas Ulander Jean-Louis Reymond Hongming Chen

J Cheminform

March 2019

Recent applications of recurrent neural networks (RNN) enable training models that sample the chemical space. In this study we train RNN with molecular string representations (SMILES) with a subset of the enumerated database GDB-13 (975 million molecules). We show that a model trained with 1 million structures (0.

View Article and Find Full Text PDF

Deep Learning Invades Drug Design and Synthesis.

Josep Arús-Pous Daniel Probst Jean-Louis Reymond

Chimia (Aarau)

February 2018

View Article and Find Full Text PDF

Chemical Space: Big Data Challenge for Molecular Diversity.

Mahendra Awale Ricardo Visini Daniel Probst Josep Arús-Pous Jean-Louis Reymond

Chimia (Aarau)

October 2017

Chemical space describes all possible molecules as well as multi-dimensional conceptual spaces representing the structural diversity of these molecules. Part of this chemical space is available in public databases ranging from thousands to billions of compounds. Exploiting these databases for drug discovery represents a typical big data problem limited by computational power, data storage and data access capacity.

View Article and Find Full Text PDF

Virtual Exploration of the Ring Systems Chemical Universe.

Ricardo Visini Josep Arús-Pous Mahendra Awale Jean-Louis Reymond

J Chem Inf Model

November 2017

Here, we explore the chemical space of all virtually possible organic molecules focusing on ring systems, which represent the cyclic cores of organic molecules obtained by removing all acyclic bonds and converting all remaining atoms to carbon. This approach circumvents the combinatorial explosion encountered when enumerating the molecules themselves. We report the chemical universe database GDB4c containing 916 130 ring systems up to four saturated or aromatic rings and maximum ring size of 14 atoms and GDB4c3D containing the corresponding 6 555 929 stereoisomers.

View Article and Find Full Text PDF