Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

NIST Machine-Print Database of Gray Scale and Binary Images (MPDB)


This database has been discontinued and is no longer available.
 

The NIST machine-printed database which was formerly part of the Special Databases collection contains gray scale and binary images of machine printed pages. The database was previously known as Special Database 8.

There was a total of 3,063,168 characters in the set which is an average of 8509 characters per page.

A reference file was included for each page. These reference files are the ASCII text pages that were used to generate the original hardcopy that was digitized.

This database was being distributed for use in the development and testing of Optical Character Recognition (OCR) systems on a common set of images. This allowed vendors to report results with respect to this common image set.

The database had the following features:

  • 3 font styles: Bold, Italics, and Normal
  • 6 font types: Courier, Helvetica, New Century Schoolbook, Optima, Palatino, and Times Roman
  • 10 point sizes; 4, 5, 6, 7, 8, 10, 11, 12, 15, 17, and 20
  • randomly generated order and sequential ordered pages
  • 360 unique pages each having a gray scale and binary representation
  • 12 pixels/mm resolution
  • 360 text files containing page reference answers
  • image format documentation and example software


Suitable for automated machine-print research, development, and evaluation, the data set can be used for:

  • algorithm development
  • system training and testing
  • character segmentation: separating full page image into characters
  • character recognition: identifying specific machine-printed characters


The database was a valuable tool for measurement and comparison of system performance on machine-print pages.

The contact for this database is:

Karen Marshall
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Gaithersburg, MD 20899-8940
Phone: (301) 975-8296
karen.marshall [at] nist.gov (karen[dot]marshall[at]nist[dot]gov) (link sends e-mail)

   

 

Keywords: ASCII Reference, automated character recognition, automated data capture, binary, character recognition, font size, full page, Grayscale Image Database, machine print, NIST, OCR, optical character recognition, software recognition, style.
 



 

Created August 27, 2010, Updated April 5, 2023