Motivation: Attribute selection is a critical step in development of document classification systems. As a standard practice, words are stemmed and the most informative ones are used as attributes in classification. Owing to high complexity of biomedical terminology, general-purpose stemming algorithms are often conservative and could also remove informative stems. This can lead to accuracy reduction, especially when the number of labeled documents is small. To address this issue, we propose an algorithm that omits stemming and, instead, uses the most discriminative substrings as attributes.
Results: The approach was tested on five annotated sets of abstracts from iProLINK that report on the experimental evidence about five types of protein post-translational modifications. The experiments showed that Naive Bayes and support vector machine classifiers perform consistently better [with area under the ROC curve (AUC) accuracy in range 0.92-0.97] when using the proposed attribute selection than when using attributes obtained by the Porter stemmer algorithm (AUC in 0.86-0.93 range). The proposed approach is particularly useful when labeled datasets are small.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bioinformatics/btl350 | DOI Listing |
Alzheimers Dement
December 2024
Frontotemporal Disorders Unit, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
Background: Posterior Cortical Atrophy (PCA) is a syndrome characterized by a progressive decline in higher-order visuospatial processing, leading to symptoms such as space perception deficit, simultanagnosia, and object perception impairment. While PCA is primarily known for its impact on visuospatial abilities, recent studies have documented language abnormalities in PCA patients. This study aims to delineate the nature and origin of language impairments in PCA, hypothesizing that language deficits reflect the visuospatial processing impairments of the disease.
View Article and Find Full Text PDFBrief Bioinform
November 2024
Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Via E. Orabona 4, 70126, Bari, Italy.
The advent of high-throughput sequencing (HTS) technologies unlocked the complexity of the microbial world through the development of metagenomics, which now provides an unprecedented and comprehensive overview of its taxonomic and functional contribution in a huge variety of macro- and micro-ecosystems. In particular, shotgun metagenomics allows the reconstruction of microbial genomes, through the assembly of reads into MAGs (metagenome-assembled genomes). In fact, MAGs represent an information-rich proxy for inferring the taxonomic composition and the functional contribution of microbiomes, even if the relevant analytical approaches are not trivial and still improvable.
View Article and Find Full Text PDFBMC Infect Dis
January 2025
Centre de Recherche et de Formation en Infectiologie de Guinée (CERFIG), Université Gamal Abder Nasser de Conakry, Conakry, Guinea.
Background: Several variants of SARS-CoV-2 have a demonstrated impact on public health, including high and increased transmissibility, severity of infection, and immune escape. Therefore, this study aimed to determine the SARS-CoV-2 lineages and better characterize the dynamics of the pandemic during the different waves in Guinea.
Methods: Whole genome sequencing of 363 samples with PCR cycle threshold (Ct) values under thirty was undertaken between May 2020 and May 2023.
Sci Rep
January 2025
Department of Computer Science, Sri Guru Gobind Singh College of Commerce, University of Delhi, Delhi, India.
Domain-specific vocabulary, which is crucial in fields such as Information Retrieval and Natural Language Processing, requires continuous updates to remain effective. Incremental Learning, unlike conventional methods, updates existing knowledge without retraining from scratch. This paper presents an incremental learning algorithm for updating domain-specific vocabularies.
View Article and Find Full Text PDFInj Prev
January 2025
Pediatrics, University of Washington School of Medicine, Seattle, Washington, USA.
Introduction: George Floyd's death in 2020 galvanised large protests around the country, including the emergence of the Capitol Hill Autonomous Zone (CHAZ) in Seattle, Washington, a non-policed, organised protest region that may have differing injury risks than other regions. We sought to quantitatively describe characteristics of injuries related to protests documented at visits to two nearby major emergency departments, including the only Level 1 trauma centre in the state.
Methods: Using the International Classification of Diseases, 10th Revision code inclusion criteria, we identified 1938 unique patient visits across the two emergency departments from 29 May 2020 and 1 July 2020.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!