Click here to download MD5 File
The second NIST database of structured forms consists of 5,595 pages of binary, black-and-white images of synthesized documents containing hand-print.
The documents in this database are 12 different tax forms with the IRS 1040 Package X for the year 1988. These include Forms 1040, 2106, 2441, 4562, and 6251 together with Schedules A, B, C, D, E, F, and SE. Eight of these forms contain two pages or form faces; therefore, there are 20 different form faces represented in the database.
The document images in this database appear to be real hand-printed forms prepared by individuals, but the images have been automatically derived and synthesized using a computer and contain no "real" tax data. There are 900 simulated tax submissions represented in the database averaging 6.22 form faces per submission.
The database has the following features:
Suitable for both document processing and automated data capture research, development and evaluation, the database can be used for:
The database is a valuable tool for measurement of system performance and system comparison on complex forms.
Please click here to view the PDF version of Users' Guide.
For more information on Special Database 6 please contact:
Standard Reference Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 6410
Gaithersburg, MD 20899-6410
(844) 374-0183 (Toll Free)
The scientific contact for this database is:
Michael Garris
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Gaithersburg, MD 20899-8940
mgarris [at] nist.gov (mgarris[at]nist[dot]gov)
Keywords: ASCII Reference; character recognition; hand print; hand printed characters; NIST; OCR; optical character recognition; software recognition; synthesized documents; tax forms.