NIST
Special Database 8
NIST Machine-Print
Database of Gray Scale and Binary Images (MPDB)
A sample of the data contained in this database is
available via anonymous ftp at sequoyah.ncsl.nist.gov
in the files sd8-README.txt
and sd8.tar.Z
[699K].
The NIST machine-printed
database contains gray scale and binary images of machine printed pages.
There are 360
digitized pages on three CD-ROM discs. There are a total of 3,063,168
characters in the set which is an average of 8509 characters per page.
A reference file
is included for each page. These reference files are the ASCII text
pages that were used to generate the original hardcopy that was digitized.
This database is
being distributed for use in the development and testing of Optical Character
Recognition (OCR) systems on a common set of images. This allows vendors
to report results with respect to this common image set.
Each disc in this
three-disc set contains approximately 593 megabytes of storage when
the images are compressed. Uncompressed, each disc contains 1.1 gigabytes
of data (1.85 :1 average compression ratio using JPEG and CCITT group
4 compression schemes).
The database has
the following features:
- 3 font styles:
Bold, Italics, and Normal
- 6 font types:
Courier, Helvetica, New Century Schoolbook, Optima, Palatino, and
Times Roman
- 10 point sizes;
4, 5, 6, 7, 8, 10, 11, 12, 15, 17, and 20
- randomly generated
order and sequential ordered pages
- 360 unique
pages each having a gray scale and binary representation
- 12 pixels/mm
resolution
- 360 text files
containing page reference answers
- image format
documentation and example software
Suitable for automated
machine-print research, development, and evaluation, the data set can
be used for:
- algorithm development
- system training
and testing
- character segmentation:
separating full page image into characters
- character recognition:
identifying specific machine-printed characters
The database is
a valuable tool for measurement and comparison of system performance
on machine-print pages.
You may browse
the Users' Guide to see how this
database works.
Please click here
to view the PDF version of Users'
Guide.
System
Requirements: CD-ROM drive with software to read ISO-9660
format.
Special pricing
for multiple copies available. Call for details.
Price:
$90.00 
Special pricing
for multiple copies available. Call for details.
For
more information on Special Database 8 please contact:
- Standard Reference
Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 2300
Gaithersburg, MD 20899-2310
(301) 975-2008 (VOICE) / (301) 926-0416 (FAX) / Contact Us
The
scientific contact for this database is:
- Michael Garris
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Building 225, Room A216
Gaithersburg, MD 20899-8940
(301) 975-2928
michael.garris@nist.gov
Keywords: ASCII Reference, automated
character recognition, automated data capture, binary, character recognition,
font size, full page, Grayscale Image Database, machine print, NIST, OCR,
optical character recognition, software recognition, style.