Publications by authors named "Ioannis Vlahavas"

Background: Attention deficit hyperactivity disorder (ADHD) is one of the most common neurodevelopmental disorders during childhood; however, the diagnosis procedure remains challenging, as it is nonstandardized, multiparametric, and highly dependent on subjective evaluation of the perceived behavior.

Objective: To address the challenges of existing procedures for ADHD diagnosis, the ADHD360 project aims to develop a platform for (1) early detection of ADHD by assessing the user's likelihood of having ADHD characteristics and (2) providing complementary training for ADHD management.

Methods: A 2-phase nonrandomized controlled pilot study was designed to evaluate the ADHD360 platform, including ADHD and non-ADHD participants aged 7 to 16 years.

View Article and Find Full Text PDF

Drug-Drug Interaction (DDI) extraction is the task of identifying drug entities and the potential interactions between drug pairs from biomedical literature. Computer-aided extraction of DDIs is vital for drug discovery, as this process remains extremely expensive and time consuming. Therefore, Machine Learning-based approaches can reduce the laborious task during the drug development cycle.

View Article and Find Full Text PDF

Argument Mining (AM) refers to the task of automatically identifying arguments in a text and finding their relations. In medical literature this is done by identifying Claims and Premises and classifying their relations as either Support or Attack. Evidence-Based Medicine (EBM) refers to the task of identifying all related evidence in medical literature to allow medical practitioners to make informed choices and form accurate treatment plans.

View Article and Find Full Text PDF

Evidence-Based Medicine (EBM) has been an important practice for medical practitioners. However, as the number of medical publications increases dramatically, it is becoming extremely difficult for medical experts to review all the contents available and make an informative treatment plan for their patients. A variety of frameworks, including the PICO framework which is named after its elements (Population, Intervention, Comparison, Outcome), have been developed to enable fine-grained searches, as the first step to faster decision making.

View Article and Find Full Text PDF
Article Synopsis
  • Single Nucleotide Polymorphisms (SNPs) are crucial for various biological applications and require effective classification methods due to their high dimensionality, as feature selection is essential for efficient analysis.
  • The paper introduces a new method called FIFS (Frequent Item Feature Selection) that identifies the most informative SNPs by selecting frequent and unique genotypes from genomic data in a modular fashion.
  • Tested on a dataset of British pig breeds, FIFS demonstrated superior performance, achieving over 95% assignment accuracy with only 28 selected SNPs, significantly fewer than other methods used in comparison.
View Article and Find Full Text PDF

Background: In this paper we present the approach that we employed to deal with large scale multi-label semantic indexing of biomedical papers. This work was mainly implemented within the context of the BioASQ challenge (2013-2017), a challenge concerned with biomedical semantic indexing and question answering.

Methods: Our main contribution is a MUlti-Label Ensemble method (MULE) that incorporates a McNemar statistical significance test in order to validate the combination of the constituent machine learning algorithms.

View Article and Find Full Text PDF

We sought to investigate whether B cell receptor immunoglobulin (BcR IG) stereotypy is associated with particular clinicobiological features among chronic lymphocytic leukemia (CLL) patients expressing mutated BcR IG (M-CLL) encoded by the IGHV4-34 gene, and also ascertain whether these associations could refine prognostication. In a series of 19,907 CLL cases with available immunogenetic information, we identified 339 IGHV4-34-expressing cases assigned to one of the four largest stereotyped M-CLL subsets, namely subsets #4, #16, #29 and #201, and investigated in detail their clinicobiological characteristics and disease outcomes. We identified shared and subset-specific patterns of somatic hypermutation (SHM) among patients assigned to these subsets.

View Article and Find Full Text PDF
Article Synopsis
  • Advances in biotechnology and health sciences have created a wealth of data, particularly through high throughput genetic data and Electronic Health Records (EHRs), necessitating the use of machine learning and data mining to extract meaningful insights.
  • The study focuses on the application of these technologies in diabetes research across various aspects including prediction, diagnosis, complications, genetic influences, and healthcare management, with prediction and diagnosis being the most explored areas.
  • Machine learning algorithms, mainly supervised ones (85%), are predominantly used, with support vector machines (SVM) being the most effective, showcasing the potential to generate valuable knowledge and new hypotheses in understanding diabetes mellitus (DM).
View Article and Find Full Text PDF

Background: Somatic Hypermutation (SHM) refers to the introduction of mutations within rearranged V(D)J genes, a process that increases the diversity of Immunoglobulins (IGs). The analysis of SHM has offered critical insight into the physiology and pathology of B cells, leading to strong prognostication markers for clinical outcome in chronic lymphocytic leukaemia (CLL), the most frequent adult B-cell malignancy. In this paper we present a methodology for integrating multiple immunogenetic and clinocobiological data sources in order to extract features and create high quality datasets for SHM analysis in IG receptors of CLL patients.

View Article and Find Full Text PDF

The advent of high-throughput genomic technologies is enabling analyses on thousands or even millions of single-nucleotide polymorphisms (SNPs). At the same time, the selection of a minimum number of SNPs with the maximum information content is becoming increasingly problematic. Available locus ranking programs have been accused of providing upwardly biased results (concerning the predicted accuracy of the chosen set of markers for population assignment), cannot handle high-dimensional datasets, and some of them are computationally intensive.

View Article and Find Full Text PDF

Νext generation sequencing studies in Homo sapiens have identified novel immunoglobulin heavy variable (IGHV) genes and alleles necessitating changes in the international ImMunoGeneTics information system (IMGT) GENE-DB and reference directories of IMGT/V-QUEST. In chronic lymphocytic leukaemia (CLL), the somatic hypermutation (SHM) status of the clonotypic rearranged IGHV gene is strongly associated with patient outcome. Correct determination of this parameter strictly depends on the comparison of the nucleotide sequence of the clonotypic rearranged IGHV gene with that of the closest germline counterpart.

View Article and Find Full Text PDF

This chapter presents a method called PolyA-iEP that has been developed for the prediction of polyadenylation sites. More precisely, PolyA-iEP is a method that recognizes mRNA 3'ends which contain polyadenylation sites. It is a modular system which consists of two main components.

View Article and Find Full Text PDF

Microsatellite loci comprise an important part of eukaryotic genomes. Their applications in biology as genetic markers are related to numerous fields ranging from paternity analyses to construction of genetic maps and linkage to human disease. Existing software solutions which offer pattern discovery algorithms for the correct identification and downstream analysis of microsatellites are scarce and are proving to be inefficient to analyze large, exponentially increasing, sequenced genomes.

View Article and Find Full Text PDF

The prediction of the translation initiation site in an mRNA or cDNA sequence is an essential step in gene prediction and an open research problem in bioinformatics. Although recent approaches perform well, more effective and reliable methodologies are solicited. We developed an adaptable data mining method, called StackTIS, which is modular and consists of three prediction components that are combined into a meta-classification system, using stacked generalization, in a highly effective framework.

View Article and Find Full Text PDF

The prediction of the translation initiation site in a genomic sequence with the highest possible accuracy is an important problem that still has to be investigated by the research community. Current approaches perform quite well, however there is still room for a more general framework for the researchers who want to follow an effective and reliable methodology. We developed a prediction methodology that combines ad hoc as well as discovered knowledge in order to significantly increase the achieved accuracy reliably.

View Article and Find Full Text PDF

Current approaches for mining association rules usually assume that the mining is performed in a static database, where the problem of missing attribute values does not practically exist. However, these assumptions are not preserved in some medical databases, like in a home care system. In this paper, a novel uncertainty rule algorithm is illustrated, namely URG-2 (Uncertainty Rule Generator), which addresses the problem of mining dynamic databases containing missing values.

View Article and Find Full Text PDF