Home Research Review: A Framework to Predict the Quality of Answers with Non-Textual Features
Review: A Framework to Predict the Quality of Answers with Non-Textual Features
Written by Kevin Chai   
Monday, 03 March 2008 19:13
Authors: Jeon, J., Croft, W.B, Lee, J.H. & Park, S.
Year: 2006
Published in: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Link: http://ciir.cs.umass.edu/personnel/IR-469.pdf
Importance to my research: High

Abstract

New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we present a framework to use non-textual features to predict the quality of documents. We also show our quality measure can be successfully incorporated into the language modeling-based retrieval model. We test our approach on a collection of question and answer pairs gathered from a community based question answering service where people ask and answer questions. Experimental results using our quality measure show a signi?cant improvement over our baseline.

Review

The authors of this paper propose a framework for estimating the quality of the document through the use of non-textual features for a Korean questions and answers portal. Their framework employs a kernel density estimation (KDE) and the maximum entropy approach to handle non-textual features and build a stochastic process that predicts the quality of documents with the selected features. It was found that this predictor can distinguish between good and bad answers. Question quality however were not evaluated in their experiment as they discovered that bad questions led to bad quality answers. Therefore they only decided to estimate the quality of answers.

The following 13 non-textual features was evaluated in the developed framework:

  1. Answerer's acceptance ratio *
  2. Answer length *
  3. Questioner's self evaluation *
  4. Answerer's activity level *
  5. Answerers category speciality
  6. Print counts *
  7. Copy counts *
  8. Users' recommendation *
  9. Editor's recommendation
  10. Sponser's recommendation
  11. Click counts *
  12. Number of answers *
  13. Users' dis-recommendation *
* - this feature may be applicable in a generic user contribution measurement (UCM) model

The maximum entropy model requires monotonic features that always represent stronger evidence with bigger values. For example, the number of recommendations is a monotonic feature as more recommendations generally means better quality. The length of an answer however is not considered a monotonic feature because longer answers do not necessarily imply better answers. Using a Gaussian KDE technique, the authors converted the length of an answer into a monotonic feature that shows the probability of having a good quality answer based on its answer length.

This quality probability indicator initially increases as the answer length becomes longer but eventually declines. It was discovered from their training data that the probability of an answer is of high quality is high for average-length answers, but low for very long answers. The authors converted a few other non-monotonic features and re-calculated the correlational coefficients of each feature. It was found that the answer length probability feature became the most significant feature in their framework. I believe that I might be able to apply some of the concepts and techniques used in this research paper to a generic UCM model. However, I'm currently not sure whether I should adopt probability models in providing an objective measure of contribution.

Important New Terms
  • Language modeling-based retrieval model
  • Maximum entropy - monotonic features
  • Non-textual features
  • Kernel density estimation (KDE) - Gaussian kernel
  • Query likelihood retrieval model
  • Document language model and collection model
  • Image annotation
  • Probability of a good (quality) answer based on answer length
  • Iterative scaling
  • Limited memory variable metric
 
" Simplicity is the ultimate sophistication "
Leonardo da Vinci

Sponsored Links