Click-words: learning to predict document keywords from a user perspective.

Bioinformatics

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

Published: November 2010

Motivation: Recognizing words that are key to a document is important for ranking relevant scientific documents. Traditionally, important words in a document are either nominated subjectively by authors and indexers or selected objectively by some statistical measures. As an alternative, we propose to use documents' words popularity in user queries to identify click-words, a set of prominent words from the users' perspective. Although they often overlap, click-words differ significantly from other document keywords.

Results: We developed a machine learning approach to learn the unique characteristics of click-words. Each word was represented by a set of features that included different types of information, such as semantic type, part of speech tag, term frequency-inverse document frequency (TF-IDF) weight and location in the abstract. We identified the most important features and evaluated our model using 6 months of PubMed click-through logs. Our results suggest that, in addition to carrying high TF-IDF weight, click-words tend to be biomedical entities, to exist in article titles, and to occur repeatedly in article abstracts. Given the abstract and title of a document, we are able to accurately predict the words likely to appear in user queries that lead to document clicks.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2958742PMC
http://dx.doi.org/10.1093/bioinformatics/btq459DOI Listing

Publication Analysis

Top Keywords

user queries
8
tf-idf weight
8
document
7
click-words
5
click-words learning
4
learning predict
4
predict document
4
document keywords
4
keywords user
4
user perspective
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!