With the advent and improvement of ontological dictionaries (WordNet, Babelnet), the use of synsets-based text representations is gaining popularity in classification tasks. More recently, ontological dictionaries were used for reducing dimensionality in this kind of representation (, Semantic Dimensionality Reduction System (SDRS) (Vélez de Mendizabal et al., 2020)).
View Article and Find Full Text PDFDespite new developments in machine learning classification techniques, improving the accuracy of spam filtering is a difficult task due to linguistic phenomena that limit its effectiveness. In particular, we highlight polysemy, synonymy, the usage of hypernyms/hyponyms, and the presence of irrelevant/confusing words. These problems should be solved at the pre-processing stage to avoid using inconsistent information in the building of classification models.
View Article and Find Full Text PDFDrugs have become an essential part of our lives due to their ability to improve people's health and quality of life. However, for many diseases, approved drugs are not yet available or existing drugs have undesirable side effects, making the pharmaceutical industry strive to discover new drugs and active compounds. The development of drugs is an expensive process, which typically starts with the detection of candidate molecules (screening) after a protein target has been identified.
View Article and Find Full Text PDFPurpose: To evaluate the accuracy of relative stopping power and spatial resolution of images reconstructed with simulated helium CT (HeCT) in comparison to proton CT (pCT).
Methods: A Monte Carlo (MC) study with the TOPAS tool was performed to compare the accuracy of relative stopping power (RSP) reconstruction and spatial resolution of low-fluence HeCT to pCT, both using 200 MeV/u particles. An ideal setup consisting of a flat beam source and a totally absorbing energy-range detector was implemented to estimate the theoretically best achievable RSP accuracy for the calibration and reconstruction methods currently used for pCT.
In this work we present the design and implementation of WARCProcessor, a novel multiplatform integrative tool aimed to build scientific datasets to facilitate experimentation in web spam research. The developed application allows the user to specify multiple criteria that change the way in which new corpora are generated whilst reducing the number of repetitive and error prone tasks related with existing corpus maintenance. For this goal, WARCProcessor supports up to six commonly used data sources for web spam research, being able to store output corpus in standard WARC format together with complementary metadata files.
View Article and Find Full Text PDFComputational simulations offer a powerful tool for quantitatively investigating radiation interactions with biological tissue and can help bridge the gap between physics, chemistry and biology. The TOPAS collaboration is tackling this challenge by extending the current Monte Carlo tool to allow for sub-cellular in silico simulations in a new extension, TOPAS-nBio. TOPAS wraps and extends the Geant4 Monte Carlo simulation toolkit and the new extension allows the modeling of particles down to vibrational energies (∼2eV) within realistic biological geometries.
View Article and Find Full Text PDF