The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374868 | PMC |
http://dx.doi.org/10.1089/big.2017.0012 | DOI Listing |
Infect Control Hosp Epidemiol
December 2024
Department of Infectious Diseases, Infection Control, and Employee Health, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Objective: Whole genome sequencing (WGS) can help identify transmission of pathogens causing healthcare-associated infections (HAIs). However, the current gold standard of short-read, Illumina-based WGS is labor and time intensive. Given recent improvements in long-read Oxford Nanopore Technologies (ONT) sequencing, we sought to establish a low resource approach providing accurate WGS-pathogen comparison within a time frame allowing for infection prevention and control (IPC) interventions.
View Article and Find Full Text PDFAntonie Van Leeuwenhoek
December 2024
GIPhy - Genome Informatics and Phylogenetics, Biological Resource Center of Institut Pasteur, Institut Pasteur, Université de Paris, 75015, Paris, France.
A Gram-staining-positive, aerobic bacterium, designated strain JJ-181, was isolated from the root surface of soybean. Based on the 16S rRNA gene sequence similarities, strain JJ-181 was grouped into the genus Cohnella, most closely related to Cohnella hashimotonis F6_2S_P_1 (98.85%) and C.
View Article and Find Full Text PDFAntonie Van Leeuwenhoek
December 2024
Department of Systems Biotechnology, Chung-Ang University, Anseong, 17546, South Korea.
A Gram-stain-negative, aerobic, non-spore-forming, non-motile, coccus-shaped, and red-pigmented bacterial strain designated as CJ14 was isolated from lettuce cultivation soil in Yong-In, South Korea. Strain CJ14 grew optimally on Luria-Bertani agar at 37 ℃ and pH 7.0 in the absence of NaCl.
View Article and Find Full Text PDFClin Rheumatol
December 2024
Immunology and Histocompatibility Department, Hedi Chaker University Hospital, Sfax, Tunisia.
Introduction/objectives: Psoriatic arthritis (PsA) is a chronic inflammatory rheumatism belonging to the spondyloarthritis family. It is a multifactorial disease whose genetic determinism is still poorly understood. It is favored by environmental factors in genetically predisposed individuals.
View Article and Find Full Text PDFMicrobiol Spectr
December 2024
Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies (SIBPT), Fudan University, Shanghai, China.
The gene , encoding the mannitol transporter subunit IICBA of the phosphotransferase system, was the core gene with the greatest variability in and could be used as a new typing marker in . To expand its application, we performed an evolutionary analysis and found that the gene was present in nine phyla, 371 genera, and 1,662 species of bacteria. It is commonly found in pathogenic species of , followed by , , etc.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!