Supervised and traditional term weighting methods for automatic text categorization.

IEEE Trans Pattern Anal Mach Intell

Department of Computer Science and Technology, East China Normal University, Shanghai, China.

Published: April 2009

In vector space model (VSM), text representation is the task of transforming the content of a textual document into a vector in the term space so that the document could be recognized and classified by a computer or a classifier. Different terms (i.e. words, phrases, or any other indexing units used to identify the contents of a text) have different importance in a text. The term weighting methods assign appropriate weights to the terms to improve the performance of text categorization. In this study, we investigate several widely-used unsupervised (traditional) and supervised term weighting methods on benchmark data collections in combination with SVM and kappa NN algorithms. In consideration of the distribution of relevant documents in the collection, we propose a new simple supervised term weighting method, i.e. tf.rf, to improve the terms' discriminating power for text categorization task. From the controlled experimental results, these supervised term weighting methods have mixed performance. Specifically, our proposed supervised term weighting method, tf.rf, has a consistently better performance than other term weighting methods while other supervised term weighting methods based on information theory or statistical metric perform the worst in all experiments. On the other hand, the popularly used tf.idf method has not shown a uniformly good performance in terms of different data sets.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2008.110DOI Listing

Publication Analysis

Top Keywords

term weighting
32
weighting methods
24
supervised term
20
text categorization
12
term
9
weighting
8
weighting method
8
method tfrf
8
supervised
6
methods
6

Similar Publications

This study investigated the effects of Chlamydomonas reinhardtii polysaccharides (CRPs) on retarding the retrogradation of japonica rice starch (JS) and glutinous rice starch (GS). Structure characterization revealed that CRPs, with an average molecular weight of 505 kDa, mainly consisted of glucose, mannose, and galactose and featured a triple-helix structure. CRPs could reduce the storage modulus increment of JS during the cooling process by interacting with amylose, thereby inhibiting gel network formation.

View Article and Find Full Text PDF

Shaping the structural dynamics of motor learning through cueing during sleep.

Sleep

January 2025

UR2NF-Neuropsychology and Functional Neuroimaging Research Unit affiliated at CRCN - Centre for Research in Cognition and Neurosciences and UNI - ULB Neuroscience Institute, Université Libre de Bruxelles (ULB), Brussels, Belgium.

Enhancing the retention of recent memory traces through sleep reactivation is possible via Targeted Memory Reactivation (TMR), involving cueing learned material during post-training sleep. Evidence indicates detectable short-term microstructural changes in the brain within an hour after motor sequence learning, and post-training sleep is believed to contribute to the consolidation of these motor memories, potentially leading to enduring microstructural changes. In this study, we explored how TMR during post-training sleep affects performance gains and delayed microstructural remodeling, using both standard Diffusion Tensor Imaging (DTI) and advanced Neurite Orientation Dispersion & Density Imaging (NODDI).

View Article and Find Full Text PDF

Background: To investigate the effectiveness of different bariatric metabolic surgeries in improving metabolic syndrome indicators in patients.

Methods: A retrospective analysis was conducted on obese patients who underwent laparoscopic sleeve gastrectomy (LSG), laparoscopic sleeve gastrectomy + jejunojejunal bypass (LSG + JJB), and laparoscopic Roux-en-Y gastric bypass (LRYGB). Patients were categorized into groups based on their surgical procedure: LSG (N = 199), LSG + JJB (N = 242), and LRYGB (N = 288).

View Article and Find Full Text PDF

Limited treatment options are available for bladder cancer (BCa) resulting in extremely high mortality rates. Cyclovirobuxine D (CVB-D), a naturally alkaloid, reportedly exhibits notable antitumor activity against diverse tumor types. However, its impact on CVB-D on BCa and its precise molecular targets remain unexplored.

View Article and Find Full Text PDF

Background: Glucagon-like peptide-1 receptor agonists (GLP1RAs) are widely used in manageing type 2 diabetes mellitus and weight control. Their potential in treating ageing-related diseases has been gaining attention in recent years. However, the long-term effects of GLP1RAs on these diseases have yet to be fully revealed.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!