Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

NIST Special Database 19

NIST Handprinted Forms and Characters Database

Special Database 19 contains NIST's entire corpus of training materials for handprinted document and character recognition. It publishes Handprinted Sample Forms from 3600 writers, 810,000 character images isolated from their forms, ground truth classifications for those images, reference forms for further data collection, and software utilities for image management and handling.

The features of this database are:

  • Final accumulation of NIST's handprinted sample data
  • Full page HSF forms from 3600 writers
  • Separate digit, upper and lower case, and free text fields
  • Over 800,000 images with hand checked classifications


    The database is NIST's largest and probably final release of images intended for handprint document processing and OCR research. The full page images are the default input to the NIST FORM-BASED HANDPRINT RECOGNITION SYSTEM, a public domain release of end-to-end recognition software.

2nd Edition – September 2016

Download – by_class.zipMD5 hash file
Download – by_field.zipMD5 hash file
Download – by_merge.zip MD5 hash file
Download – by_write.zipMD5 hash file
Download – hsf_page.zipMD5 hash file

Please click here to view the PDF version of Users' Guide

Example HSF Image. This is the file hsf_page/hsf_0/f0002_01.pct.  Notice that the first field on this form, the name field, has been intentionally occluded, on some others it remains blank.  All fields except those on the first line havebeen segmented and
Example HSF Image.

1st Edition - March 1995

Download – 1st Edition1995.zipMD5 hash file

Please click here to view the PDF version of Users' Guide

The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 and converted to a 28x28 pixel image format and dataset structure that directly matches the MNIST dataset.      

For more information please contact:
Standard Reference Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 6410
Gaithersburg, MD 20899-6410
(844) 374-0183 (Toll Free) 

 

The scientific contact for this database is:
Patrick J. Grother
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Building 225, Room A216
Gaithersburg, MD 20899-8940 (301) 975-4157 patrick.grother [at] nist.gov (patrick[dot]grother[at]nist[dot]gov (link sends e-mail))

Keywords: Automated character recognition; automated data capture; character recognition; forms recognition; handwriting recognition; OCR; optical character recognition; software recognition.

DOI: http://doi.org/10.18434/T4H01C

Customer Support
 

Created August 27, 2010, Updated April 27, 2019