Special Database 19 contains NIST's entire corpus of training materials for handprinted document and character recognition. It publishes Handprinted Sample Forms from 3600 writers, 810,000 character images isolated from their forms, ground truth classifications for those images, reference forms for further data collection, and software utilities for image management and handling.
The features of this database are:
The database is NIST's largest and probably final release of images intended for handprint document processing and OCR research. The full page images are the default input to the NIST FORM-BASED HANDPRINT RECOGNITION SYSTEM, a public domain release of end-to-end recognition software.
2nd Edition – September 2016
Download – by_class.zip – MD5 hash file
Download – by_field.zip – MD5 hash file
Download – by_merge.zip – MD5 hash file
Download – by_write.zip – MD5 hash file
Download – hsf_page.zip – MD5 hash file
Please click here to view the PDF version of Users' Guide
1st Edition - March 1995
Download – 1st Edition1995.zip – MD5 hash file
Please click here to view the PDF version of Users' Guide
The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 and converted to a 28x28 pixel image format and dataset structure that directly matches the MNIST dataset.
For more information please contact:
Standard Reference Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 6410
Gaithersburg, MD 20899-6410
(844) 374-0183 (Toll Free)
The scientific contact for this database is:
Patrick J. Grother
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Building 225, Room A216
Gaithersburg, MD 20899-8940 (301) 975-4157 patrick.grother [at] nist.gov (patrick[dot]grother[at]nist[dot]gov (link sends e-mail))
Keywords: Automated character recognition; automated data capture; character recognition; forms recognition; handwriting recognition; OCR; optical character recognition; software recognition.