Word embedding, a lexical vector representation generated via the neural linguistic model (NLM), is empirically demonstrated to be appropriate for improvement of the performance of traditional language model. However, the supreme dimensionality that is inherent in NLM contributes to the problems of hyperparameters and long-time training in modeling. Here, we propose a force-directed method to improve such problems for simplifying the generation of word embedding. In this framework, each word is assumed as a point in the real world; thus it can approximately simulate the physical movement following certain mechanics. To simulate the variation of meaning in phrases, we use the fracture mechanics to do the formation and breakdown of meaning combined by a 2-gram word group. With the experiments on the natural linguistic tasks of part-of-speech tagging, named entity recognition and semantic role labeling, the result demonstrated that the 2-dimensional word embedding can rival the word embeddings generated by classic NLMs, in terms of accuracy, recall, and text visualization.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5029052 | PMC |
http://dx.doi.org/10.1155/2016/3506261 | DOI Listing |
J Cheminform
January 2025
Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.
Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure-activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing.
View Article and Find Full Text PDFNeuroscience
January 2025
Human Communication, Learning, and Development, Faculty of Education, The University of Hong Kong, China.
The human brain possesses the ability to automatically extract statistical regularities from environmental inputs, including visual-graphic symbols and printed units. However, the specific brain regions underlying the statistical learning of these visual-graphic symbols or artificial orthography remain unclear. This study utilized functional magnetic resonance imaging (fMRI) with an artificial orthography learning paradigm to measure brain activities associated with the statistical learning of radical positional regularities embedded in pseudocharacters containing high (100%), moderate (80%), and low (60%) levels of consistency, along with a series of random abstract figures.
View Article and Find Full Text PDFJ Cogn
January 2025
Department of Humanities, University of Trento, via Tommaso Gar 14, 38122, Trento, Italy.
The productive use of morphological information is considered one of the possible ways in which speakers of a language understand and learn unknown words. In the present study we investigate if, and how, also adult L2 learners exploit morphological information to process unknown words by analyzing the impact of language proficiency in the processing of novel derivations. Italian L2 learners, divided into three proficiency groups, participated in a lexical decision where pseudo-words could embed existing stems (e.
View Article and Find Full Text PDFJAMIA Open
February 2025
Department of Medicine, University of Wisconsin-Madison, Madison, WI 53792, United States.
Objective: To evaluate large language models (LLMs) for pre-test diagnostic probability estimation and compare their uncertainty estimation performance with a traditional machine learning classifier.
Materials And Methods: We assessed 2 instruction-tuned LLMs, Mistral-7B-Instruct and Llama3-70B-chat-hf, on predicting binary outcomes for Sepsis, Arrhythmia, and Congestive Heart Failure (CHF) using electronic health record (EHR) data from 660 patients. Three uncertainty estimation methods-Verbalized Confidence, Token Logits, and LLM Embedding+XGB-were compared against an eXtreme Gradient Boosting (XGB) classifier trained on raw EHR data.
Comput Biol Med
January 2025
Thai Nguyen University of Information and Communication Technology, Thai Nguyen City, Viet Nam. Electronic address:
Protein succinylation, a post-translational modification wherein a succinyl group (-CO-CH₂-CH₂-CO-) attaches to lysine residues, plays a critical regulatory role in cellular processes. Dysregulated succinylation has been implicated in the onset and progression of various diseases, including liver, cardiac, pulmonary, and neurological disorders. However, identifying succinylation sites through experimental methods is often labor-intensive, costly, and technically challenging.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!