Validating Quality in Large-Scale Digitization

Inter-rater Reliability: Gold Standard Test

The presence of significant levels of inconsistency in the assignment of error during manual generates error in the statistical evaluation of the findings of quality review and undermines the reliability of the findings. To test for inter-rater reliability, pairs of coders from the University of Michigan and the University of Minnesota coded a training set of images with pre-determined levels of severity for each page-image error type. The results show a high level of consistency across versions of the error model. Results