Home Research Review: Data Quality in Context
Review: Data Quality in Context
Written by Kevin Chai   
Tuesday, 26 February 2008 18:29
Authors: Strong, D.M., Lee, Y.W. & Wang, R.Y
Year: 1997 Published in: Communications of the ACM
Link: http://portal.acm.org/citation.cfm?id=253769.253804

Abstract

Data-Quality (DQ) problems are increasing evident, particularly in organizational databases. Indeed, 50% to 80% of computerized criminal records in the U.S. were found to be inaccurate, incomplete, or ambiguous. The social and economic impact of poor-quality data costs billions of dollars [5-7, 10]. Organizational databases, however, resides in the larger context of information systems (IS). Within this larger contetx, data is collected from multiple data sources and stored in databases. From this stored data, useful information is generated for organizational decision-making.

Review

This 1997 article and other data quality (DQ) related publications by the authors has been cited by two articles I have recently reviewed, Measuring Information Quality of Web Sites: Development of an Instrument and Data Qualty on the Web. I performed a search of this research paper in Google Scholar and identified that it had been cited by 257 papers along with one of their previous papers, Beyond accuracy: what data quality means to data consumers which accumulated 451 citations. Table 1 presents a now familiar description of four DQ categories and their dimensions as previously discovered in my DQ related literature reviews.

This paper proposes DQ problem patterns with DQ projects and provides a comprehensive discussion of intrinsic, accessibility and contextual DQ problem patterns. The proposed patterns are also discussed with references to case-study examples from 42 DQ projects within 3 leading organisations based on their attention to DQ. These patterns can be useful in highlighting potential causes of DQ problems by tracing issues in the life-cycle of data (i.e. from data producers to data custodians and to data consumers).

Interestingly, this paper has defined high-quality data as data that is fit for use by data consumers and this has also been stated as a widely adopted definition. However, DQ has also been evaluated by the providers (i.e. data custodians) in online communities / social software websites. A social software provider may adopt revenue generation models and perceive high quality content as content that helps generate the most amount of revenue. The same content however may not be perceived as high quality content by the user community (i.e. data consumers).

If this social software provider adopts a revenue sharing model (i.e. give a portion of their revenue to users that contribute content) then they are likely to favour users that produce the most financially rewarding content. This is somewhat evident in many revenue sharing social software websites where they implement an individual-based revenue sharing model (a user might receive 50% of advertising revenue generated from their own content).

Two problems I foresee with this approach is that users may generate vast amounts of low-quality content or focus their attention in generating content that produces the highest amount of revenue (for advertising this might be writing content related to keywords that pay the most). This may render the social software less attractive to users due to low quality content, information overload and a lack of content diversity. In other words, a revenue sharing model, if not well-thought and implemented, could eventually serve as a inhibiter to the development of a self-sustaining online community. Additionally, a community-based revenue sharing model (i.e. revenue generated from multiple sources and shared with users based on their contributions) should evaluate the quality of user contributions from the perspective of the user community.

Important New Terms
  • Intrinsic, accessibility & contextual data quality problem patterns
  • Trend analysis
  • Meeting data quality requirements that change over time
 
" The secret of science is to ask the right question, and it is the choice of problem more than anything else that marks the man of genius in the scientific world. "
Henry Tizard

Sponsored Links