Word2vec introduced by Mikolov et al. is a word embedding method that is widely used in natural language processing. Despite its success and frequent use, a strong theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8425479 | PMC |
http://dx.doi.org/10.3389/fams.2020.593406 | DOI Listing |
Sci Rep
December 2024
SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark.
The increasing development of technology has led to the increase of digital data in various fields, such as medication-related texts. Sentiment Analysis (SA) in medication is essential to give clinicians insights into patients' feedback about the treatment procedure. Therefore, this study intends to develop Artificial Intelligence (AI) models to predict patients' sentiments.
View Article and Find Full Text PDFBioact Mater
April 2025
Department of Oral and Cranio-maxillofacial Surgery, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine; College of Stomatology, Shanghai Jiao Tong University; National Center for Stomatology; National Clinical Research Center for Oral Diseases; Shanghai Key Laboratory of Stomatology; Shanghai Research Institute of Stom, Shanghai, 200011, China.
Angiogenesis is imperative for bone regeneration, yet the conventional cytokine therapies have been constrained by prohibitive costs and safety apprehensions. It is urgent to develop a safer and more efficient therapeutic alternative. Herein, utilizing the methodologies of Deep Learning (DL) and Natural Language Processing (NLP), we proposed a paradigm algorithm that amalgamates with a variant, , to deftly discern potential pro-angiogenic peptides from intrinsically disordered regions (IDRs) of 262 related proteins, where are fertile grounds for developing safer and highly promising bioactive peptides.
View Article and Find Full Text PDFJ Am Med Inform Assoc
December 2024
Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322, United States.
Objective: To detect and classify features of stigmatizing and biased language in intensive care electronic health records (EHRs) using natural language processing techniques.
Materials And Methods: We first created a lexicon and regular expression lists from literature-driven stem words for linguistic features of stigmatizing patient labels, doubt markers, and scare quotes within EHRs. The lexicon was further extended using Word2Vec and GPT 3.
Methods
December 2024
Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand. Electronic address:
Appl Psychol Health Well Being
February 2025
School of Economics and Management, China University of Mining and Technology, Xuzhou, China.
The high-level risk perception diffusion caused by public health emergencies seriously threatens public mental health and social stability. Much scholarly attention focused on the traditional epidemic models or simply combined content and social attributes, overlooking the differences in public individual characteristics. This paper proposes an SSEIIIR model of risk perception diffusion by innovatively subdividing susceptible people and infectious people.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!