VALIDATING QUALITY IN LARGE-SCALE DIGITIZATION
The Institute of Museum and Library Services (IMLS) awarded a grant of $674,000 to the School of Information (SI) at the University of Michigan (Ann Arbor) to define, test, and apply measures of image and text quality in a very large collection of digitized books and serials. The two-year project (2011-12) is a collaboration between SI and the University of Michigan Libraries, with important contributions from the HathiTrust Digital Library and the University of Minnesota Libraries.The large-scale digitization of books and serials is generating extraordinary collections of intellectual content that are transforming teaching and scholarship. Questions are being raised, however, regarding the quality and usefulness of digital surrogates produced by third-party vendors and deposited for preservation in digital repositories. For preservation repositories and their communities of users to trust that digital documents have the capacity to meet the uses envisioned for them, repositories must validate the quality and fitness for use of the objects they preserve. This research project is designed to develop and test a methodology for assessing the quality of digital surrogates and to validate the findings with groups of end-users.
The project builds upon a planning effort in 2009-10, sponsored by the Andrew W. Mellon Foundation, to formulate a research methodology for evaluating the quality of digitized books and journals held by research libraries, but produced in large-scale digitization programs third-by Google, the Internet Archive, and other third-party vendors.
The HathiTrust Digital Library serves as the source of digitized books and serials for the project, which has two overlapping phases.
- Research Phase 1 (2011) - focuses on defining a model of digitization error and a scale for recording consistently and accurately the severity of observed error.
- Research Phase 2 (2011-12) - focuses on applying the research methodology developed in Phase 1 and validating the results of the error analysis within the context of specific use case scenarios.




