The past decade has witnessed rapid progress in deep learning for molecular design, owing to the availability of invertible and invariant representations for molecules such as simplified molecular-input line-entry system (SMILES), which has powered cheminformatics since the late 1980s. However, the design of elemental components and their structural arrangement in solid-state materials to achieve certain desired properties is still a long-standing challenge in physics, chemistry and biology. This is primarily due to, unlike molecular inverse design, the lack of an invertible crystal representation that satisfies translational, rotational, and permutational invariances. To address this issue, we have developed a simplified line-input crystal-encoding system (SLICES), which is a string-based crystal representation that satisfies both invertibility and invariances. The reconstruction routine of SLICES successfully reconstructed 94.95% of over 40,000 structurally and chemically diverse crystal structures, showcasing an unprecedented invertibility. Furthermore, by only encoding compositional and topological data, SLICES guarantees invariances. We demonstrate the application of SLICES in the inverse design of direct narrow-gap semiconductors for optoelectronic applications. As a string-based, invertible, and invariant crystal representation, SLICES shows promise as a useful tool for in silico materials discovery.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10622439 | PMC |
http://dx.doi.org/10.1038/s41467-023-42870-7 | DOI Listing |
Front Pharmacol
January 2025
Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, CA, United States.
Introduction: Recent advances in 3D structure-based deep learning approaches demonstrate improved accuracy in predicting protein-ligand binding affinity in drug discovery. These methods complement physics-based computational modeling such as molecular docking for virtual high-throughput screening. Despite recent advances and improved predictive performance, most methods in this category primarily rely on utilizing co-crystal complex structures and experimentally measured binding affinities as both input and output data for model training.
View Article and Find Full Text PDFSci Rep
January 2025
Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, Abu Dhabi, United Arab Emirates.
The problem of protein structure determination is usually solved by X-ray crystallography. Several in silico deep learning methods have been developed to overcome the high attrition rate, cost of experiments and extensive trial-and-error settings, for predicting the crystallization propensities of proteins based on their sequences. In this work, we benchmark the power of open protein language models (PLMs) through the TRILL platform, a be-spoke framework democratizing the usage of PLMs for the task of predicting crystallization propensities of proteins.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Materials Science and Engineering, Kyoto University, Sakyo, Kyoto, 606-8501, Japan.
The discovery of novel materials is crucial for developing new functional materials. This study introduces a predictive model designed to forecast complex multi-component oxide compositions, leveraging data derived from simpler pseudo-binary systems. By applying tensor decomposition and machine learning techniques, we transformed pseudo-binary oxide compositions from the Inorganic Crystal Structure Database (ICSD) into tensor representations, capturing key chemical trends such as oxidation states and periodic positions.
View Article and Find Full Text PDFComput Biol Chem
December 2024
Guangdong Provincial Key Laboratory of Pharmaceutical Bioactive Substances, Guangdong Pharmaceutical University, Guangzhou 510006, PR China. Electronic address:
In the present study, we uncovered and validated potential biomarkers related to gout, characterized by the accumulation of sodium urate crystals in different joint and non-joint structures. The data set GSE160170 was obtained from the GEO database. We conducted differential gene expression analysis, GO enrichment assessment, and KEGG pathway analysis to understand the underlying processes.
View Article and Find Full Text PDFSci Rep
December 2024
Key Laboratory of Computing Power Network and Information Security, Shandong Computer Science Center (National Supercomputing Center in Jinan), Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250013, Shandong, P. R. China.
Crystal structure similarity is useful for the chemical analysis of nowadays big materials databases and data mining new materials. Here we propose to use two-dimensional Wasserstein distance (earth mover's distance) to measure the compositional similarity between different compounds, based on the periodic table representation of compositions. To demonstrate the effectiveness of our approach, 1586 Cu-S based compounds are taken from the inorganic crystal structure database (ICSD) to form a validation dataset.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!