Fracture Mechanics Method for Word Embedding Generation of Neural Probabilistic Linguistic Model.

Comput Intell Neurosci

Institute of Electronics, Chinese Academy of Sciences, Beijing, China.

Published: February 2017

Word embedding, a lexical vector representation generated via the neural linguistic model (NLM), is empirically demonstrated to be appropriate for improvement of the performance of traditional language model. However, the supreme dimensionality that is inherent in NLM contributes to the problems of hyperparameters and long-time training in modeling. Here, we propose a force-directed method to improve such problems for simplifying the generation of word embedding. In this framework, each word is assumed as a point in the real world; thus it can approximately simulate the physical movement following certain mechanics. To simulate the variation of meaning in phrases, we use the fracture mechanics to do the formation and breakdown of meaning combined by a 2-gram word group. With the experiments on the natural linguistic tasks of part-of-speech tagging, named entity recognition and semantic role labeling, the result demonstrated that the 2-dimensional word embedding can rival the word embeddings generated by classic NLMs, in terms of accuracy, recall, and text visualization.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5029052	PMC
http://dx.doi.org/10.1155/2016/3506261	DOI Listing

Publication Analysis

Top Keywords

word embedding

fracture mechanics

linguistic model

word

mechanics method

method word

embedding

embedding generation

generation neural

neural probabilistic

Similar Publications

Context-dependent similarity analysis of analogue series for structure-activity relationship transfer based on a concept from natural language processing.

J Cheminform

January 2025

Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.

Atsushi Yoshimori Jürgen Bajorath

Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure-activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing.

View Article and Find Full Text PDF

Similar Publications

Statistical learning of artificial orthographic regularity arises from coordinated activity across distinct brain regions.

Neuroscience

January 2025

Human Communication, Learning, and Development, Faculty of Education, The University of Hong Kong, China.

Xiuhong Tong Yating Lv Tiantian Wang Rujun Duan Shelley Xiuli Tong

The human brain possesses the ability to automatically extract statistical regularities from environmental inputs, including visual-graphic symbols and printed units. However, the specific brain regions underlying the statistical learning of these visual-graphic symbols or artificial orthography remain unclear. This study utilized functional magnetic resonance imaging (fMRI) with an artificial orthography learning paradigm to measure brain activities associated with the statistical learning of radical positional regularities embedded in pseudocharacters containing high (100%), moderate (80%), and low (60%) levels of consistency, along with a series of random abstract figures.

View Article and Find Full Text PDF

Similar Publications

The Role of Morphological Information in Processing Pseudo-words in Italian L2 Learners: It's a Matter of Experience.

J Cogn

January 2025

Department of Humanities, University of Trento, via Tommaso Gar 14, 38122, Trento, Italy.

Simona Amenta Francesca Foppolo Linda Badan

The productive use of morphological information is considered one of the possible ways in which speakers of a language understand and learn unknown words. In the present study we investigate if, and how, also adult L2 learners exploit morphological information to process unknown words by analyzing the impact of language proficiency in the processing of novel derivations. Italian L2 learners, divided into three proficiency groups, participated in a lexical decision where pseudo-words could embed existing stems (e.

View Article and Find Full Text PDF

Similar Publications

Uncertainty estimation in diagnosis generation from large language models: next-word probability is not pre-test probability.

JAMIA Open

February 2025

Department of Medicine, University of Wisconsin-Madison, Madison, WI 53792, United States.

Yanjun Gao Skatje Myers Shan Chen Dmitriy Dligach Timothy Miller

Objective: To evaluate large language models (LLMs) for pre-test diagnostic probability estimation and compare their uncertainty estimation performance with a traditional machine learning classifier.

Materials And Methods: We assessed 2 instruction-tuned LLMs, Mistral-7B-Instruct and Llama3-70B-chat-hf, on predicting binary outcomes for Sepsis, Arrhythmia, and Congestive Heart Failure (CHF) using electronic health record (EHR) data from 660 patients. Three uncertainty estimation methods-Verbalized Confidence, Token Logits, and LLM Embedding+XGB-were compared against an eXtreme Gradient Boosting (XGB) classifier trained on raw EHR data.

View Article and Find Full Text PDF

Similar Publications

Integrating CNN and Bi-LSTM for protein succinylation sites prediction based on Natural Language Processing technique.

Comput Biol Med

January 2025

Thai Nguyen University of Information and Communication Technology, Thai Nguyen City, Viet Nam. Electronic address:

Thi-Xuan Tran Nguyen Quoc Khanh Le Van-Nui Nguyen

Protein succinylation, a post-translational modification wherein a succinyl group (-CO-CH₂-CH₂-CO-) attaches to lysine residues, plays a critical regulatory role in cellular processes. Dysregulated succinylation has been implicated in the onset and progression of various diseases, including liver, cardiac, pulmonary, and neurological disorders. However, identifying succinylation sites through experimental methods is often labor-intensive, costly, and technically challenging.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!