Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence can be found at https://textessence.github.io.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8212692PMC

Publication Analysis

Top Keywords

corpora embeddings
8
textessence
4
textessence tool
4
tool interactive
4
analysis
4
interactive analysis
4
analysis semantic
4
semantic shifts
4
shifts corpora
4
embeddings concepts
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!