Publications by authors named "Karoly Heberger"

Extended similarity indices (i.e., generalization of pairwise similarity) have recently gained importance because of their simplicity, fast computation and superiority in tasks like diversity picking.

View Article and Find Full Text PDF

Non-negative matrix factorization (NMF) efficiently reduces high dimensionality for many-objective ranking problems. In multi-objective optimization, as long as only three or four conflicting viewpoints are present, an optimal solution can be determined by finding the Pareto front. When the number of the objectives increases, the multi-objective problem evolves into a many-objective optimization task, where the Pareto front becomes oversaturated.

View Article and Find Full Text PDF

Molecular dynamics (MD) is a core methodology of molecular modeling and computational design for the study of the dynamics and temporal evolution of molecular systems. MD simulations have particularly benefited from the rapid increase of computational power that has characterized the past decades of computational chemical research, being the first method to be successfully migrated to the GPU infrastructure. While new-generation MD software is capable of delivering simulations on an ever-increasing scale, relatively less effort is invested in developing postprocessing methods that can keep up with the quickly expanding volumes of data that are being generated.

View Article and Find Full Text PDF

The screening of compounds for ADME-Tox targets plays an important role in drug design. QSPR models can increase the speed of these specific tasks, although the performance of the models highly depends on several factors, such as the applied molecular descriptors. In this study, a detailed comparison of the most popular descriptor groups has been carried out for six main ADME-Tox classification targets: Ames mutagenicity, P-glycoprotein inhibition, hERG inhibition, hepatotoxicity, blood-brain-barrier permeability, and cytochrome P450 2C9 inhibition.

View Article and Find Full Text PDF

Extended (or n-ary) similarity indices have been recently proposed to extend the comparative analysis of binary strings. Going beyond the traditional notion of pairwise comparisons, these novel indices allow comparing any number of objects at the same time. This results in a remarkable efficiency gain with respect to other approaches, since now we can compare N molecules in O(N) instead of the common quadratic O(N) timescale.

View Article and Find Full Text PDF

The Promethee-GAIA method is a multicriteria decision support technique that defines the aggregated ranks of multiple criteria and visualizes them based on Principal Component Analysis (PCA). In the case of numerous criteria, the PCA biplot-based visualization do not perceive how a criterion influences the decision problem. The central question is how the Promethee-GAIA-based decision-making process can be improved to gain more interpretable results that reveal more characteristic inner relationships between the criteria.

View Article and Find Full Text PDF

Quantification of similarities between protein sequences or DNA/RNA strands is a (sub-)task that is ubiquitously present in bioinformatics workflows, and is usually accomplished by pairwise comparisons of sequences, utilizing simple ( percent identity) or more intricate concepts ( substitution scoring matrices). Complex tasks (such as clustering) rely on a large number of pairwise comparisons under the hood, instead of a direct quantification of set similarities. Based on our recently introduced framework that enables multiple comparisons of binary molecular fingerprints (, direct calculation of the similarity of fingerprint sets), here we introduce novel symmetric similarity indices for analogous calculations on sets of character sequences with more than two () possible items ( DNA/RNA sequences with  = 4, or protein sequences with  = 20).

View Article and Find Full Text PDF

In this review, we outline the current trends in the field of machine learning-driven classification studies related to ADME (absorption, distribution, metabolism and excretion) and toxicity endpoints from the past six years (2015-2021). The study focuses only on classification models with large datasets (i.e.

View Article and Find Full Text PDF

In recent decades, eye-movement detection technology has improved significantly, and eye-trackers are available not only as standalone research tools but also as computer peripherals. This rapid spread gives further opportunities to measure the eye-movements of participants. The current paper provides classification models for the prediction of food choice and selects the best one.

View Article and Find Full Text PDF

Quantification of the similarity of objects is a key concept in many areas of computational science. This includes cheminformatics, where molecular similarity is usually quantified based on binary fingerprints. While there is a wide selection of available molecular representations and similarity metrics, there were no previous efforts to extend the computational framework of similarity calculations to the simultaneous comparison of more than two objects (molecules) at the same time.

View Article and Find Full Text PDF

Despite being a central concept in cheminformatics, molecular similarity has so far been limited to the simultaneous comparison of only two molecules at a time and using one index, generally the Tanimoto coefficent. In a recent contribution we have not only introduced a complete mathematical framework for extended similarity calculations, (i.e.

View Article and Find Full Text PDF

Similarity measures are widely used in various areas from taxonomy to cheminformatics. To this end, a large number of similarity and distance measures (or, collectively, comparative measures) have been introduced, with only a few studies directed to revealing their inner relationships. We present a thorough analytical study of the conditions leading to two comparative measures providing equivalent results over a given set of molecules.

View Article and Find Full Text PDF

Applied datasets can vary from a few hundred to thousands of samples in typical quantitative structure-activity/property (QSAR/QSPR) relationships and classification. However, the size of the datasets and the train/test split ratios can greatly affect the outcome of the models, and thus the classification performance itself. We compared several combinations of dataset sizes and split ratios with five different machine learning algorithms to find the differences or similarities and to select the best parameter settings in nonbinary (multiclass) classification.

View Article and Find Full Text PDF

Recently, H NMR (nuclear magnetic resonance) spectroscopy was presented as a viable option for the quality assurance of foods and beverages, such as wine products. Here, a complex chemometric analysis of red and white wine samples was carried out based on their H NMR spectra. Extreme gradient boosting (XGBoost) machine learning algorithm was applied for the wine variety classification with an iterative double cross-validation loop, developed during the present work.

View Article and Find Full Text PDF

Finding optimal solutions usually requires multicriteria optimization. The sum of ranking differences (SRD) algorithm can efficiently solve such problems. Its principles and earlier applications will be discussed here, along with meta-analyses of papers published in various subfields of food science, such as analytics in food chemistry, food engineering, food technology, food microbiology, quality control, and sensory analysis.

View Article and Find Full Text PDF

Extracellular vesicles (EVs) are lipid bilayer-bounded particles that are actively synthesized and released by cells. The main components of EVs are lipids, proteins, and nucleic acids and their composition is characteristic to their type and origin, and it reveals the physiological and pathological conditions of the parent cells. The concentration and protein composition of EVs closely relate to their functions; therefore, total protein determination can assist in EV-based diagnostics and disease prognosis.

View Article and Find Full Text PDF

Sum of Ranking Differences is an innovative statistical method that ranks competing solutions based on a reference point. The latter might arise naturally, or can be aggregated from the data. We provide two case studies to feature both possibilities.

View Article and Find Full Text PDF

Quantitation of surface roughness is difficult, if subtle, but significant differences cause an uncommon variance. We used atomic force microscopy to measure the surface roughness of polyethylene terephthalate (PET) fibers before and after a 30 s plasma treatment of 300 W. Samples were measured multiple times at different locations, in four scan sizes.

View Article and Find Full Text PDF

How far-reaching is the influence of the urban area over the mineral composition of the mushroom? To answer this question, we monitored the metal uptake behavior of this fungus relying on the soil properties. We sampled mushroom and soil from six forests according to an urbanization gradient, and two city parks in Cluj-Napoca (Romania). The elements were quantified using inductively coupled plasma - optical emission spectroscopy (ICP-OES).

View Article and Find Full Text PDF

How far-reaching is the influence of the urban area over the mineral composition of the Russula cyanoxantha mushroom? We studied the metal uptake behavior of this fungus relying on the soil properties. We sampled mushroom and soil from six forests according to an urbanization gradient, and two city parks in Cluj-Napoca (Romania). The elements were quantified using inductively coupled plasma - optical emission spectroscopy (ICP-OES).

View Article and Find Full Text PDF

Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well.

View Article and Find Full Text PDF

Ensemble docking is a widely applied concept in structure-based virtual screening-to at least partly account for protein flexibility-usually granting a significant performance gain at a modest cost of speed. From the individual, single-structure docking scores, a consensus score needs to be produced by data fusion: this is usually done by taking the best docking score from the available pool (in most cases- and in this study as well-this is the minimum score). Nonetheless, there are a number of other fusion rules that can be applied.

View Article and Find Full Text PDF

QSAR/QSPR (quantitative structure-activity/property relationship) modeling has been a prevalent approach in various, overlapping sub-fields of computational, medicinal and environmental chemistry for decades. The generation and selection of molecular descriptors is an essential part of this process. In typical QSAR workflows, the starting pool of molecular descriptors is rationalized based on filtering out descriptors which are (i) constant throughout the whole dataset, or (ii) very strongly correlated to another descriptor.

View Article and Find Full Text PDF

Reversed-phase high-performance liquid chromatography (RP-HPLC) is the most popular chromatographic mode, accounting for more than 90% of all separations. HPLC itself owes its immense popularity to it being relatively simple and inexpensive, with the equipment being reliable and easy to operate. Due to extensive automation, it can be run virtually unattended with multiple samples at various separation conditions, even by relatively low-skilled personnel.

View Article and Find Full Text PDF