Take a sneak peek at the new NIST.gov and let us know what you think!
(Please note: some content may not be complete on the beta site.).

View the beta site
NIST logo

Standard Reference Data, NIST:
100 Bureau Drive, Stop 8500
Gaithersburg, MD 20899-8500
Tel: 301-975-2200
Fax: 301-975-4553

If you have any questions regarding this website, or notice any problems or inaccurate information, please contact the webmaster by sending e-mail to: data@nist.gov

Bookmark and Share

NIST Special Database 2

NIST Structured Forms Reference Set of Binary Images (SFRS)

Rate our Products and Services

Price: $90.00  online faxmail

If you are having problems with the Online Purchase or Fax/Mail Order Link.

Effective immediately, there will be a minimum $30.00 shipping charge for all international shipments of databases via UPS International. Customer will be responsible for their own duties, tax, and VAT. Contact 301 975 2200 or data@nist.gov if you have questions.

The NIST Structured Forms Database consists of 5,590 pages of binary, black-and-white images of synthesized documents.

The documents in this database are 12 different tax forms from the IRS 1040 Package X for the year 1988. These include Forms 1040, 2106, 2441, 4562, and 6251 together with Schedules A, B, C, D, E, F, and SE.

Eight of these forms contain two pages or form faces; therefore, there are 20 different form faces represented in the database.

The document images in this database appear to be real forms prepared by individuals, but the images have been automatically derived and synthesized using a computer.

There are 900 simulated tax submissions represented in the database averaging 6.2 form faces per submission. This significant new database totals approximately 5.9 gigabytes of uncompressed image data including image format documentation and example software.

The database has the following features:

  • 900 simulated tax submissions
  • 5,590 images of completed structured form faces
  • 300 pixel/inch resolution
  • 5,590 text files containing entry field answers
  • 20 tables of entry field types and contexts
  • image format documentation and example software

Suitable for both document processing and automated data capture research, development, and evaluation, the data set can be used for:

  • forms identification
  • field isolation; locating the entry fields on the form
  • character segmentation: separating entry field values into characters
  • character recognition: identifying specific machine printed characters

This database is a valuable tool for measurement of system performance and system comparison on complex forms.

System Requirements: CD-ROM drive with software to read ISO-9660 format.

Please click here to view the PDF version of Users' Guide.

For more information on Special Database 2 please contact:

Standard Reference Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 8500
Gaithersburg, MD 20899-2310
(301) 975-2200 (VOICE) / (301) 975-4553 (FAX) / Contact Us


The scientific contact for this database is:

Michael Garris
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Building 225, Room A216
Gaithersburg, MD 20899-8940 (301) 975-2928 michael.garris@nist.gov

Keywords: ASCII Reference, automated character recognition, automated data capture, Binary Image Database, forms identification, image format documentation, IRS, NIST, Machine Print, OCR, optical character recognition, printed characters, software recognition, synthesized documents, tax forms