Publications by authors named "Joao Rafael Almeida"

Metagenomics is a rapidly expanding field that uses next-generation sequencing technology to analyze the genetic makeup of environmental samples. However, accurately identifying the organisms in a metagenomic sample can be complex, and traditional reference-based methods may need to be more effective in some instances. In this study, we present a novel approach for metagenomic identification, using data compressors as a feature for taxonomic classification.

View Article and Find Full Text PDF

Single Sign-On (SSO) methods are the primary solution to authenticate users across multiple web systems. These mechanisms streamline the authentication procedure by avoiding duplicate developments of authentication modules for each application. Besides, these mechanisms also provide convenience to the end-user by keeping the user authenticated when switching between different contexts.

View Article and Find Full Text PDF

A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power.

View Article and Find Full Text PDF

The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and-as highlighted during the coronavirus disease 2019 pandemic-their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail.

View Article and Find Full Text PDF

Background: Secondary use of health data is a valuable source of knowledge that boosts observational studies, leading to important discoveries in the medical and biomedical sciences. The fundamental guiding principle for performing a successful observational study is the research question and the approach in advance of executing a study. However, in multi-centre studies, finding suitable datasets to support the study is challenging, time-consuming, and sometimes impossible without a deep understanding of each dataset.

View Article and Find Full Text PDF

Biomedical databases often have restricted access policies and governance rules. Thus, an adequate description of their content is essential for researchers who wish to use them for medical research. A strategy for publishing information without disclosing patient-level data is through database fingerprinting and aggregate characterisations.

View Article and Find Full Text PDF

In the last decades, the field of metagenomics aided by NGS technologies has grown exponentially and is now a cornerstone tool in medicine. However, even with the current technologies, obtaining a conclusive identification of an organism can be challenging due to using reference-based methods. Consequently, when releasing a new repository of genomic data that contains de-novo sequences, it is problematic to characterize its content.

View Article and Find Full Text PDF

Anonymisation is currently one of the biggest challenges when sharing sensitive personal information. Its importance depends largely on the application domain, but when dealing with health information, this becomes a more serious issue. A simpler approach to avoid inadequate disclosure is to ensure that all data that can be associated directly with an individual is removed from the original dataset.

View Article and Find Full Text PDF
Article Synopsis
  • The study aimed to create and validate prediction models to identify rheumatoid arthritis (RA) patients at high risk for adverse health outcomes while starting first-line methotrexate (MTX) treatment.
  • Data from 15 different claims and health record databases across 9 countries were analyzed, focusing on risks for various conditions at different time frames (3 months, 2 years, and 5 years) after treatment initiation.
  • The models showed good performance in predicting serious infections, myocardial infarction, and stroke, indicating potential for practical clinical application in monitoring RA patients on MTX.
View Article and Find Full Text PDF

Many clinical studies are greatly dependent on an efficient identification of relevant datasets. This selection can be performed in existing health data catalogues, by searching for available metadata. The search process can be optimised through questioning-answering interfaces, to help researchers explore the available data present.

View Article and Find Full Text PDF

Background: The content of the clinical notes that have been continuously collected along patients' health history has the potential to provide relevant information about treatments and diseases, and to increase the value of structured data available in Electronic Health Records (EHR) databases. EHR databases are currently being used in observational studies which lead to important findings in medical and biomedical sciences. However, the information present in clinical notes is not being used in those studies, since the computational analysis of this unstructured data is much complex in comparison to structured data.

View Article and Find Full Text PDF

The process of refining the research question in a medical study depends greatly on the current background of the investigated subject. The information found in prior works can directly impact several stages of the study, namely the cohort definition stage. Besides previous published methods, researchers could also leverage on other materials, such as the output of cohort selection tools, to enrich and to accelerate their own work.

View Article and Find Full Text PDF

Background: Electronic health records store large amounts of patient clinical data. Despite efforts to structure patient data, clinical notes containing rich patient information remain stored as free text, greatly limiting its exploitation. This includes family history, which is highly relevant for applications such as diagnosis and prognosis.

View Article and Find Full Text PDF

Privacy issues limit the analysis and cross-exploration of most distributed and private biobanks, often raised by the multiple dimensionality and sensitivity of the data associated with access restrictions and policies. These characteristics prevent collaboration between entities, constituting a barrier to emergent personalized and public health challenges, namely the discovery of new druggable targets, identification of disease-causing genetic variants, or the study of rare diseases. In this paper, we propose a semi-automatic methodology for the analysis of distributed and private biobanks.

View Article and Find Full Text PDF

Aiming to better understand the genetic and environmental associations of Alzheimer's disease, many clinical trials and scientific studies have been conducted. However, these studies are often based on a small number of participants. To address this limitation, there is an increasing demand of multi-cohorts studies, which can provide higher statistical power and clinical evidence.

View Article and Find Full Text PDF

Electronic health records contain valuable information on patients' clinical history in the form of free text. Manually analyzing millions of these documents is unfeasible and automatic natural language processing methods are essential for efficiently exploiting these data. Within this, normalization of clinical entities, where the aim is to link entity mentions to reference vocabularies, is of utmost importance to successfully extract knowledge from clinical narratives.

View Article and Find Full Text PDF

Background: Many healthcare databases have been routinely collected over the past decades, to support clinical practice and administrative services. However, their secondary use for research is often hindered by restricted governance rules. Furthermore, health research studies typically involve many participants with complementary roles and responsibilities which require proper process management.

View Article and Find Full Text PDF

In the last decades, the amount of medical imaging studies and associated metadata has been rapidly increasing. Despite being mostly used for supporting medical diagnosis and treatment, many recent initiatives claim the use of medical imaging studies in clinical research scenarios but also to improve the business practices of medical institutions. However, the continuous production of medical imaging studies coupled with the tremendous amount of associated data, makes the real-time analysis of medical imaging repositories difficult using conventional tools and methodologies.

View Article and Find Full Text PDF