Home Research Review: An Evaluation of Automatic Text Categorization in Online Discussion Analysis
Review: An Evaluation of Automatic Text Categorization in Online Discussion Analysis
Written by Kevin Chai   
Thursday, 21 February 2008 21:02
Authors: Lui, A.K.F., Li, S.C. & Choy, S.O.
Year: 2007
Published in: Proceedings of the Seventh IEEE International Conference on Advanced Learning Technologies (ICALT)
Link: http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/4280926/4280927/04280992.pdf?isnumber=4280927?=CNF&arnumber=4280992&arSt=205&ared=209&arAuthor=Lui%2C+Andrew+Kwok-Fai%3B+Li%2C+Siu+Cheung%3B+Choy%2C+Sheung+On

Abstract

Content analysis is often employed by teachers and research to analyse online discussion forums to serve various purposes such as assessment, evaluation, and educational research. Automating content analysis is desirable so that such analysis can be carried out efficiently on large amount of data. This paper evaluates text categorization and examines whether the attainable accuracy can satisfy the requirements of common content analysis tasks. It shows that even simple text categorization techniques can support tasks such as online learning progress monitoring. Methods techniques are also discussed.

Review

This paper evaluates the application of 3 text categorisation schemes to the analysis of online discussions between students in a teaching and learning discussion forum. The text categorisation classifier used in this research comprises of a vector-space model coupled with latent semantic analysis (LSA) or a Naive Bayes (NB) classifier. This research is segmented into 3 experiments in which the accuracy of their classifier is measured against manual content analysis conducted by a person. Firstly, forum messages are placed into two categories: Academic (contains academic related content) and general messages.

The second experiment classifies these academic messages into knowledge seeking (a message that contains a academic related question) and knowledge contributing (a message that contains a response to a academic related question) categories. Lastly, five distinction domain specific topics were created (i.e. networking, audio, image, video and java programming) for academic messages to be classified under. It should be noted that topics can be created for other non-teaching and learning related domains. The second experiment is rather interesting as the two message categories proposed were used as two parameters in the user contribution measurement (UCM) model developed in my Honours thesis. However, the approach employed by the UCM model to determine whether a forum post was a question or a response to a question was rather simplistic. The text categorisation ideas presented from this research will be investigated for application in improving the accuracy of parameter categorisation / capture in the UCM model. The authors have also noted that the use of natural language processing approaches such as part-of-speech tagger would further improve the categorisation of seeking and contributing messages.

Additionally, the experimental results from this research paper were promising but also highlights the need for further improvement to achieve higher levels of text categorisation accuracy. Accuracy is particularly important if the results generated from these models are used to assess student's performance (marks / grades) and/or for revenue sharing. The authors have suggested that the use of semi-automated mechanism, i.e. incorporating automated content analysis along with manual judgement appears to be the most promising approach until text categorisation accuracies can be improved.

Important New Terms
  • Automated content analysis
  • Text categorization
  • Online learning progress monitoring
  • Accuracy of analysis acceptable for student grading
  • Natural language processing
  • Vector-space model
  • Latent semantic analysis (LSA)
  • Naive Bayes classifier
  • Part-of-speech tagger
  • Knowledge seeking and contributing
 
" We’ve heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that is not true "
Robert Wilensky

Sponsored Links