Validating Quality in Large-Scale Digitization

Distribution of Error: First Production Run Results

The analysis of distribution of error is undertaken at the page-level and volume-level. The page-level analysis threats the full set of images in the full sample as a single population, without regard to association with a particular volume or to image sequence. The volume-level analysis aggregates data from the subsample page-image sequences to produce a measure of error incidence. The level of detail in error data at the page level permits statistically significant aggregation of findings from page to volume. Volume-level error aggregation is the foundation for establishing quality scores for digitized volumes based on the relative number and severity of errors across a mix of error attributes. The featured analysis shows distribution of error at the page level from first production run. The sampling frame for the first production run was Google-Digitized, Public Domain, English Language, and Published pre-1923. Results