NIST logo
Contact

Standard Reference Data, NIST:
100 Bureau Drive, Stop 2300
Gaithersburg, MD 20899-2300

Tel: 301-975-2200
Fax: 301-926-0416

If you have any questions regarding this website, or notice any problems or inaccurate information, please contact the webmaster by sending e-mail to: data@nist.gov

*

NIST Special Database 8

NIST Machine-Print Database of Gray Scale and Binary Images (MPDB)

Rate our Products and Services

 
Price: $90.00onlinefaxmail

If you are having problems with the Online Purchase or Fax/Mail Order Link.

Effective immediately, there will be a minimum $30.00 shipping charge for all international shipments of databases via UPS International. Customer will be responsible for their own duties, tax, and VAT. Contact 301 975 2200 or data@nist.gov if you have questions.

A sample of the data contained in this database is available via anonymous ftp at sequoyah.ncsl.nist.gov in the files sd8-README.txt and sd8.tar.Z [699K].

The NIST machine-printed database contains gray scale and binary images of machine printed pages.

There are 360 digitized pages on three CD-ROM discs. There are a total of 3,063,168 characters in the set which is an average of 8509 characters per page.

A reference file is included for each page. These reference files are the ASCII text pages that were used to generate the original hardcopy that was digitized.

This database is being distributed for use in the development and testing of Optical Character Recognition (OCR) systems on a common set of images. This allows vendors to report results with respect to this common image set.

Each disc in this three-disc set contains approximately 593 megabytes of storage when the images are compressed.
Uncompressed, each disc contains 1.1 gigabytes of data (1.85 :1 average compression ratio using JPEG and CCITT group 4 compression schemes).

The database has the following features:

  • 3 font styles: Bold, Italics, and Normal
  • 6 font types: Courier, Helvetica, New Century Schoolbook, Optima, Palatino, and Times Roman
  • 10 point sizes; 4, 5, 6, 7, 8, 10, 11, 12, 15, 17, and 20
  • randomly generated order and sequential ordered pages
  • 360 unique pages each having a gray scale and binary representation
  • 12 pixels/mm resolution
  • 360 text files containing page reference answers
  • image format documentation and example software


Suitable for automated machine-print research, development, and evaluation, the data set can be used for:

  • algorithm development
  • system training and testing
  • character segmentation: separating full page image into characters
  • character recognition: identifying specific machine-printed characters


The database is a valuable tool for measurement and comparison of system performance on machine-print pages.

Please click here to view the PDF version of Users' Guide.

System Requirements: CD-ROM drive with software to read ISO-9660 format.

If you are having problems with the Online Purchase or Fax/Mail Order Link.

Price: $90.00onlinefaxmail

Special pricing for multiple copies available. Call for details.

Effective immediately, there will be a minimum $30.00 shipping charge for all international shipments of databases via UPS International. Customer will be responsible for their own duties, tax, and VAT. Contact 301 975 2200 or data@nist.gov if you have questions.

For more information on Special Database 8 please contact:

Standard Reference Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 2300
Gaithersburg, MD 20899-2310 (301) 975-2200 (VOICE) / (301) 926-0416 (FAX) / Contact Us

 

The scientific contact for this database is:

Michael Garris
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Building 225, Room A216
Gaithersburg, MD 20899-8940 (301) 975-2928 michael.garris@nist.gov

 

Keywords: ASCII Reference, automated character recognition, automated data capture, binary, character recognition, font size, full page, Grayscale Image Database, machine print, NIST, OCR, optical character recognition, software recognition, style.