Landsat and Sentinel-2 acquisitions are among the most widely used medium-resolution optical data adopted for terrestrial vegetation applications, such as land cover and land use mapping, vegetation condition and phenology monitoring, and disturbance and change mapping. When combined, both data archives provide over 40 years, and counting, of continuous and consistent observations. Although the spatio-temporal availability of both data archives is well-known at the scene level, information on the actual availability of cloud-, snow-, and shade-free observations at the pixel level is lacking and should be explored individually for each study to correctly parametrize subsequent analyses.
View Article and Find Full Text PDFMotivation: With the exponential growth of the life sciences literature, biomedical text mining (BTM) has become an essential technology for accelerating the extraction of insights from publications. The identification of entities in texts, such as diseases or genes, and their normalization, i.e.
View Article and Find Full Text PDFMotivation: Biomedical entity linking (BEL) is the task of grounding entity mentions to a given knowledge base (KB). Recently, neural name-based methods, system identifying the most appropriate name in the KB for a given mention using neural network (either via dense retrieval or autoregressive modeling), achieved remarkable results for the task, without requiring manual tuning or definition of domain/entity-specific rules. However, as name-based methods directly return KB names, they cannot cope with homonyms, i.
View Article and Find Full Text PDFBackground: Scientific workflow systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets, as they offer reproducibility, dependability, and scalability of analyses by automatic parallelization on large compute clusters. However, implementing workflows is difficult due to the involvement of many black-box tools and the deep infrastructure stack necessary for their execution. Simultaneously, user-supporting tools are rare, and the number of available examples is much lower than in classical programming languages.
View Article and Find Full Text PDFMotivation: In precision oncology (PO), clinicians aim to find the best treatment for any patient based on their molecular characterization. A major bottleneck is the manual annotation and evaluation of individual variants, for which usually a range of knowledge bases are screened. To incorporate and integrate the vast information of different databases, fast and accurate methods for harmonizing databases with different types of information are necessary.
View Article and Find Full Text PDFAim: We aimed to evaluate the applicability of a customized NanoString panel for molecular subtyping of recurrent or metastatic head and neck squamous cell carcinoma (R/M-HNSCC). Additionally, histological analyses were conducted, correlated with the molecular subtypes and tested for their prognostic value.
Material And Methods: We conducted molecular subtyping of R/M-HNSCC according to the molecular subtypes defined by Keck et al.
Introduction: Natural language processing (NLP) is an intersection between Computer Science and Linguistic which aims to enable machines to process and understand human language. We here summarized applications and limitations of NLP in dentistry.
Data And Sources: Narrative review.
Importance: Clinical interpretation of complex biomarkers for precision oncology currently requires manual investigations of previous studies and databases. Conversational large language models (LLMs) might be beneficial as automated tools for assisting clinical decision-making.
Objective: To assess performance and define their role using 4 recent LLMs as support tools for precision oncology.
Motivation: Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks for biomedical text mining, different studies adopt different experimental setups making comparisons based on published numbers problematic.
View Article and Find Full Text PDFSummary: Relation extraction (RE) from large text collections is an important tool for database curation, pathway reconstruction, or functional omics data analysis. In practice, RE often is part of a complex data analysis pipeline requiring specific adaptations like restricting the types of relations or the set of proteins to be considered. However, current systems are either non-programmable web sites or research code with fixed functionality.
View Article and Find Full Text PDFPancreatic neuroendocrine neoplasms (panNENs) are a rare yet diverse type of neoplasia whose precise clinical-pathological classification is frequently challenging. Since incorrect classifications can affect treatment decisions, additional tools which support the diagnosis, such as machine learning (ML) techniques, are critically needed but generally unavailable due to the scarcity of suitable ML training data for rare panNENs. Here, we demonstrate that a multi-step ML framework predicts clinically relevant panNEN characteristics while being exclusively trained on widely available data of a healthy origin.
View Article and Find Full Text PDFThe identification of chemical-protein interactions described in the literature is an important task with applications in drug design, precision medicine and biotechnology. Manual extraction of such relationships from the biomedical literature is costly and often prohibitively time-consuming. The BioCreative VII DrugProt shared task provides a benchmark for methods for the automated extraction of chemical-protein relations from scientific text.
View Article and Find Full Text PDFHigh-throughput technologies led to the generation of a wealth of data on regulatory DNA elements in the human genome. However, results from disease-driven studies are primarily shared in textual form as scientific articles. Information extraction (IE) algorithms allow this information to be (semi-)automatically accessed.
View Article and Find Full Text PDFMachine learning (ML) approaches have demonstrated the ability to predict molecular spectra at a fraction of the computational cost of traditional theoretical chemistry methods while maintaining high accuracy. Graph neural networks (GNNs) are particularly promising in this regard, but different types of GNNs have not yet been systematically compared. In this work, we benchmark and analyze five different GNNs for the prediction of excitation spectra from the QM9 dataset of organic molecules.
View Article and Find Full Text PDFBackground: Pancreatic neuroendocrine neoplasms (PanNENs) fall into two subclasses: the well-differentiated, low- to high-grade pancreatic neuroendocrine tumors (PanNETs), and the poorly-differentiated, high-grade pancreatic neuroendocrine carcinomas (PanNECs). While recent studies suggest an endocrine descent of PanNETs, the origin of PanNECs remains unknown.
Methods: We performed DNA methylation analysis for 57 PanNEN samples and found that distinct methylation profiles separated PanNENs into two major groups, clearly distinguishing high-grade PanNECs from other PanNETs including high-grade NETG3.
Today's scientific data analysis very often requires complex Data Analysis Workflows (DAWs) executed over distributed computational infrastructures, e.g., clusters.
View Article and Find Full Text PDFBackground: The clinical management of high-grade gastroenteropancreatic neuroendocrine neoplasms (GEP-NEN) is challenging due to disease heterogeneity, illustrating the need for reliable biomarkers facilitating patient stratification and guiding treatment decisions. FMS-like tyrosine kinase 3 ligand (Flt3L) is emerging as a prognostic or predictive surrogate marker of host tumoral immune response and might enable the stratification of patients with otherwise comparable tumor features.
Methods: We evaluated Flt3L gene expression in tumor tissue as well as circulating Flt3L levels as potential biomarkers in a cohort of 54 patients with GEP-NEN.
High-throughput technologies have led to a continuously growing amount of information about regulatory features in the genome. A wealth of data generated by large international research consortia is available from online databases. Disease-driven studies provide details on specific DNA elements or epigenetic modifications regulating gene expression in specific cellular and developmental contexts, but these results are usually only published in scientific articles.
View Article and Find Full Text PDFObjective: We present the Berlin-Tübingen-Oncology corpus (BRONCO), a large and freely available corpus of shuffled sentences from German oncological discharge summaries annotated with diagnosis, treatments, medications, and further attributes including negation and speculation. The aim of BRONCO is to foster reproducible and openly available research on Information Extraction from German medical texts.
Materials And Methods: BRONCO consists of 200 manually deidentified discharge summaries of cancer patients.
Summary: Named entity recognition (NER) is an important step in biomedical information extraction pipelines. Tools for NER should be easy to use, cover multiple entity types, be highly accurate and be robust toward variations in text genre and style. We present HunFlair, a NER tagger fulfilling these requirements.
View Article and Find Full Text PDFBioinformatics
April 2021
Motivation: The automatic extraction of published relationships between molecular entities has important applications in many biomedical fields, ranging from Systems Biology to Personalized Medicine. Existing works focused on extracting relationships described in single articles or in single sentences. However, a single record is rarely sufficient to judge upon the biological correctness of a relation, as experimental evidence might be weak or only valid in a certain context.
View Article and Find Full Text PDFLesion-based targeting strategies underlie cancer precision medicine. However, biological principles - such as cellular senescence - remain difficult to implement in molecularly informed treatment decisions. Functional analyses in syngeneic mouse models and cross-species validation in patient datasets might uncover clinically relevant genetics of biological response programs.
View Article and Find Full Text PDFMotivation: A significant portion of molecular biology investigates signalling pathways and thus depends on an up-to-date and complete resource of functional protein-protein associations (PPAs) that constitute such pathways. Despite extensive curation efforts, major pathway databases are still notoriously incomplete. Relation extraction can help to gather such pathway information from biomedical publications.
View Article and Find Full Text PDFBackground: Diagnosis and treatment decisions in cancer increasingly depend on a detailed analysis of the mutational status of a patient's genome. This analysis relies on previously published information regarding the association of variations to disease progression and possible interventions. Clinicians to a large degree use biomedical search engines to obtain such information; however, the vast majority of scientific publications focus on basic science and have no direct clinical impact.
View Article and Find Full Text PDFDetection of epithelial ovarian cancer (EOC) poses a critical medical challenge. However, novel biomarkers for diagnosis remain to be discovered. Therefore, innovative approaches are of the utmost importance for patient outcome.
View Article and Find Full Text PDF