NIST logo
Contact

Standard Reference Data, NIST:
100 Bureau Drive, Stop 2300
Gaithersburg, MD 20899-2300

Tel: 301-975-2200
Fax: 301-926-0416

If you have any questions regarding this website, or notice any problems or inaccurate information, please contact the webmaster by sending e-mail to: data@nist.gov

*
Bookmark and Share

NIST Special Database 2

NIST Structured Forms Reference Set of Binary Images (SFRS)

Rate our Products and Services

Price: $90.00  online faxmail

If you are having problems with the Online Purchase or Fax/Mail Order Link.

Effective immediately, there will be a minimum $30.00 shipping charge for all international shipments of databases via UPS International. Customer will be responsible for their own duties, tax, and VAT. Contact 301 975 2200 or data@nist.gov if you have questions.

The NIST Structured Forms Database consists of 5,590 pages of binary, black-and-white images of synthesized documents.

The documents in this database are 12 different tax forms from the IRS 1040 Package X for the year 1988. These include Forms 1040, 2106, 2441, 4562, and 6251 together with Schedules A, B, C, D, E, F, and SE.

Eight of these forms contain two pages or form faces; therefore, there are 20 different form faces represented in the database.

The document images in this database appear to be real forms prepared by individuals, but the images have been automatically derived and synthesized using a computer.

There are 900 simulated tax submissions represented in the database averaging 6.2 form faces per submission. This significant new database totals approximately 5.9 gigabytes of uncompressed image data including image format documentation and example software.

The database has the following features:

  • 900 simulated tax submissions
  • 5,590 images of completed structured form faces
  • 300 pixel/inch resolution
  • 5,590 text files containing entry field answers
  • 20 tables of entry field types and contexts
  • image format documentation and example software


Suitable for both document processing and automated data capture research, development, and evaluation, the data set can be used for:

  • forms identification
  • field isolation; locating the entry fields on the form
  • character segmentation: separating entry field values into characters
  • character recognition: identifying specific machine printed characters


This database is a valuable tool for measurement of system performance and system comparison on complex forms.

System Requirements: CD-ROM drive with software to read ISO-9660 format.

Please click here to view the PDF version of Users' Guide.

For more information on Special Database 2 please contact:

Standard Reference Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 2300
Gaithersburg, MD 20899-2310
(301) 975-2200 (VOICE) / (301) 926-0416 (FAX) / Contact Us

 

The scientific contact for this database is:

Michael Garris
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Building 225, Room A216
Gaithersburg, MD 20899-8940 (301) 975-2928 michael.garris@nist.gov


Keywords: ASCII Reference, automated character recognition, automated data capture, Binary Image Database, forms identification, image format documentation, IRS, NIST, Machine Print, OCR, optical character recognition, printed characters, software recognition, synthesized documents, tax forms