The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374868PMC
http://dx.doi.org/10.1089/big.2017.0012DOI Listing

Publication Analysis

Top Keywords

typing systems
12
entity typing
8
typing
5
liberal entity
4
entity extraction
4
extraction rapid
4
rapid construction
4
construction fine-grained
4
entity
4
fine-grained entity
4

Similar Publications

Objective: Whole genome sequencing (WGS) can help identify transmission of pathogens causing healthcare-associated infections (HAIs). However, the current gold standard of short-read, Illumina-based WGS is labor and time intensive. Given recent improvements in long-read Oxford Nanopore Technologies (ONT) sequencing, we sought to establish a low resource approach providing accurate WGS-pathogen comparison within a time frame allowing for infection prevention and control (IPC) interventions.

View Article and Find Full Text PDF

Description of Cohnella rhizoplanae sp. nov., isolated from the root surface of soybean (Glycine max).

Antonie Van Leeuwenhoek

December 2024

GIPhy - Genome Informatics and Phylogenetics, Biological Resource Center of Institut Pasteur, Institut Pasteur, Université de Paris, 75015, Paris, France.

A Gram-staining-positive, aerobic bacterium, designated strain JJ-181, was isolated from the root surface of soybean. Based on the 16S rRNA gene sequence similarities, strain JJ-181 was grouped into the genus Cohnella, most closely related to Cohnella hashimotonis F6_2S_P_1 (98.85%) and C.

View Article and Find Full Text PDF

A Gram-stain-negative, aerobic, non-spore-forming, non-motile, coccus-shaped, and red-pigmented bacterial strain designated as CJ14 was isolated from lettuce cultivation soil in Yong-In, South Korea. Strain CJ14 grew optimally on Luria-Bertani agar at 37 ℃ and pH 7.0 in the absence of NaCl.

View Article and Find Full Text PDF

Introduction/objectives: Psoriatic arthritis (PsA) is a chronic inflammatory rheumatism belonging to the spondyloarthritis family. It is a multifactorial disease whose genetic determinism is still poorly understood. It is favored by environmental factors in genetically predisposed individuals.

View Article and Find Full Text PDF

a novel typing gene offering enhanced resolution for pandemic species.

Microbiol Spectr

December 2024

Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies (SIBPT), Fudan University, Shanghai, China.

The gene , encoding the mannitol transporter subunit IICBA of the phosphotransferase system, was the core gene with the greatest variability in and could be used as a new typing marker in . To expand its application, we performed an evolutionary analysis and found that the gene was present in nine phyla, 371 genera, and 1,662 species of bacteria. It is commonly found in pathogenic species of , followed by , , etc.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!