Validating Quality in Large-Scale Digitization

Metrics


Error Typology
Severity Scale
Helpful Definitions

ERROR TYPOLOGY


Text Errors - primarily affect the textual content of the page, either originally printed, annotated, or library supplemented information.

  • Broken Text - Character break-ups and unresolved fonts.
    This is also defined as "thin" characters. Parts of the character font can be thinned out or missing and leads to characters that are not continuous. Broken can also be defined as patches of light text in contrast to surrounding text. There is evident lightening of the text due to some error during the scanning process.Show more...
  • Thick Text - Character fill; may appear as "bold," perhaps inconsistently so.
    Characters are relatively heavy and dark and result in inability to distinguish all parts of the character. Parts of the character are filled in, for example, the top part of the character "A" is completely filled due to thickness. Normally, parts of the character do not touch, but when a character seems thick, parts of one character may touch or blend into another character.Show more...
Graphics Errors - pertain only to graphic content. Graphics are non-text, graphical/figure content. Examples include printed illustrations, complex mathematical formulas, musical notations, elaborated typography, and library added content such as bar codes, book plates, and circulation items.
  • Color Fidelity - Issues with imbalance and gradient shifts that cause the color presented in the digital image to vary from the original material. Color becomes an error when the original color of the material appears to be incorrectly or inconsistently represented in the scanned illustration. Color fidelity error may occur during image processing, the color information is interpolated to fill each pixel with an appropriate color value. The colors must then be adjusted in accordance with the color temperature of the light source. In the process, some colors are lost, others are shifted, saturation is reduced to minimize color noise etc. which means that the color represented in the digital image may vary from original material. In the absence of direct comparison with the original source, this error is detected by logical or visual deduction.Show more...
  • Scanner Effects - Visually detectible patterns that result from the interaction of scanner and printing technologies.
    Effects include but are not limited to moiré patterns and half-tone gridding. A moiré pattern - irregular plaid-like patterns that occur when a bit-mapped image is reduced, enlarged, displayed, or printed at a resolution different from the resolution of the original.Show more...
Text or Graphic Error - pertains to either text or graphical content.
  • Tone - Issues associated with brightness or contrast ratio. Contrast is how much variance there is between the brightest white and darkest black in an image.
    A low contrast ratio makes text or graphic more difficult to view or interpret, whereas in a high contrast ratio the image appears bright and this is the best for viewing the image. As tone error worsens, details of the illustration or text may become uniformly indistinguishable. Contrast error tends to apply to the entire page or, alternatively, blocked portions of graphic and text.Show more...
Page-image Errors - are individually identifiable errors that affect the visual appearance or structure of single bitmap page. A particular error may be confined to a single page or repeat across a sequence of pages in a volume. The errors typically occur as a result of post scan processing of the image file; perhaps less frequently, errors may also originate with the physical volume itself.
  • Blur - Apparent movement of the intended page during the scanning process and/or image fuzziness and indistinctness.
    The text or image appears "not crisp." Error can result from scanning processes or from faulty printing processes in the original volume. Blur may affect the entire page image or a significant portion of the page image.Show more...
  • Colorization - Colorization is the appearance of miscolorization that may affect all or part of the page image.
    The error may result from digital processing procedures intended to improve contrast between paper and text or may be an artifact of the original source material, retained in scanning and not removed through post processing. Examples of colorization include text bleed through or inconsistent color values for text, graphics, or background. Colorization relating to black/white text captured as color occurs during the cleaning process of the digital image. The cleaning process attempts to make the background white and eliminate any text bleed-through, but sometimes the process is faulty and colorization occurs on the image.Show more...
  • Crop - Some portion of the page is missing.
    Affects display of text/image block and may be associated with the gutter of the binding or the external edges of the page. During image processing, the image is cropped to eliminate any visual scanner background equipment, etc. With volumes that have very tight, narrow gutters, this can lead to cutting off of characters and whole blocks of text or image. Under-cropping also occurs when the image is not cropped enough and the scanning cradle is visible. Cropping may be a part of the source volume due to faulty publishing practices or binding processes.Show more...
  • Obscured - Portions of content blocked by original material issues, non-book elements or through post-scan image processing.
    Obscured errors can be classified into two types: Original material issues including library added content or processing error. The material itself may have problematic issues that are present before any digital imaging is undertaken. Original material issue can include such things as user annotations that affect the appearance of the image, other added elements and library added content such as labeling or collection notes. Processing errors occur when the scanning vendor applies an algorithm that attempts to enhance the image by eliminating stains, discolorations, colorization issues, and other elements on the original material. The cleaning processing algorithm may fail and cause areas of the image to be obscured by blotches and introduce elements that can obscure content.Show more...
  • Skew - The entire image is rotated or tilted from exact vertical or horizontal alignment.
    May result from digital processing of a scanned image or from publishing/binding anomalies.Show more...
  • Warp - The alignment of the page image is not precisely correct; parts of the page appear buckled, twisted, or distorted.
    Page can also be noticeably curved in any direction; elongation or distortion of text/image relative to rest of image. Warp is distinguishable from skew in that warp results in an inconsistent image; one, two or three sides of the text/illustration block may appear to be misaligned.Show more...


PROPOSED SEVERITY SCALE

  • 0 - Default - Error is undetectable on the page.
    This is the baseline assumption.Show more...
  • 1 - Error exists but has a negligible affect on the Original Content.
    As the first step up the error scale from "No detectable error," this code used to indicate the detection of minor errors that are noticeable but inconsequential. The detected error either does not affect the Original Content (e.g., blemishes or discoloration that occur clear of the main text or illustrations on a page) or affects it in a negligible way.Show more...
  • 2 - Error clearly alters appearance of Original Content, but has a neglible affect on reading ability.
    The error is immediately detectable with minimal observation (review for less than 3 seconds) but does not introduce above-average difficultly for recognizing components contained in the Original Content (on a scale of 1 to 10 the impact on reading ability is a 1 or 2).
    Test: I see the error, how hard am I working?
    Show more...
  • 3 - Error clearly alters appearance of Original Content and has a clear negative impact on reading ability.
    The error is immediately detectable, and is such that the reviewer must expend above-average effort to make out components of the Original Content. Minimal inference may be needed to make out Original Content.Show more...
  • 4 - Nearly unable to decipher Original Content in affected area of the page; significant inference required by reviewer to obtain legibility and meaning.
    In review of a page with error of this severity, the reviewer or reader would have to pause and study the content of the page, and make significant inferences to determine the information. Level four requires a higher level of subjective assessment of legibility. With effort, however, all components of the Original Content can be unambiguously deciphered.Show more...
  • 5 - Original Content in affected area of the page cannot be unambiguously deciphered.
    The affected area could range from a very small area (one or more words) to the legible areas of the entire page. Intellectual content (text or image) is lost.Show more...


HELPFUL DEFINITIONS


Original Content - the text or image content on the page created through the original printing process. Original content excludes marginalia, annotations, and other content added by users after the acquisition of the volume by the library. Original content excludes library added content such as penciled call numbers, call number labels, book plates, circulation aids, and bar codes.

Error - variations from the expected appearance of Original Content that are detected by quality reviewers. In full production review, the time reviewing each page is minimal. Errors are those that can be detected by the average reviewer in less than 3 seconds of review.

Reading ability - the ability of an average reviewer to make out letters, illustrations, and other content contained in the Original Content of a page.

Inference - the degree to which an average reviewer cannot make out Original Content, but must use contextual information to determine letters, words, or other information that compose the Original Content.






Return to top