The National Institute of Standards and Technology (NIST) has developed and maintains evaluated libraries of reference data for the identification of chemicals by the method of gas chromatograph/mass spectrometry. It includes the NIST/EPA/NIH Mass Spectral Library of electron ionization spectra and the NIST GC Retention Index (RI) Database. This database is available from distributors for integration with existing instrumentation (http://chemdata.nist.gov/mass-spc/msms-search/). The number of compounds and spectra in the newly released NIST23 have increased by 40,000 since the preceding 2020 release.
Intended Impact
The speed and accuracy of chemical analyses are vital issues in many areas. Monitoring for explosives, drugs (both legal and illegal), chemical weapons, pollutants, petrochemical processing, food contamination, and health/disease status biomarkers is a challenge with a high reward in terms of quality of life and security for the nation.
The first step in chemical analysis is usually identifying the chemical or chemicals present and the single most general technique for identifying molecules is mass spectrometry. Although there are numerous methods for ionizing materials to generate a mass spectrum, one of the oldest and the best established is electron ionization (EI). Although a mass spectrum reflects the underlying properties of the molecules, the process is sufficiently complex that identifications often require the comparison of the spectrum of an unknown with a measured spectrum of the compound – often providing a ‘fingerprint’ of a molecule.
The purposes of both the Electron Ionization (EI) Library, which is part of the NIST/EPA/NIH Mass Spectral Library, and the Retention Index (RI) Database is to aid in the identification of chemicals. In a substantial number of cases the EI Library is part of the only feasible means of identification and always make the process faster and more reliable.
Objective
The Standard Reference Data Act (Pub. L. 90-396, Sec. 1, July 11, 1968) states: "The Congress hereby finds and declares that reliable standardized scientific and technical reference data are of vital importance to the progress of the Nation's science and technology. It is therefore the policy of the Congress to make critically evaluated reference data readily available to scientists, engineers, and the general public. It is the purpose of this Act to strengthen and enhance this policy."
The objective of the EI Library is to provide the largest possible collection of reliable reference spectra with an emphasis on compounds that are biologically, environmentally or industrially relevant. The Retention Index (RI) Database provides a comprehensive collection of reported retention indices which enables a more reliable identification to be made.
Goals
EI Library: The goal is to improve the utility of a tool that is widely used to identify chemical compounds. This is accomplished by Increasing the scope and quality of relevant spectra in the library and by providing easy to use and capable software.
RI Database: The goal is to make available in an easily accessible form as much published and NIST measured retention index data as possible to aid in the identification of compounds by gas chromatography.
Research Activities and Technical Approach
EI Library
The current primary means of enhancing the size and scope of the EI Library is through direct measurement at NIST. This is done after a thorough analysis of chemicals that are both significant and available. On occasion, specialized collections are acquired and spectra are added through both solicited and spontaneous contributions.
Quality control is essential since the reliability of any identification depends on the spectra being of the correct compounds and of high quality. Although some aspects of the quality control can be and are automated, the final decision to add a spectrum to the library must be done by experts.
An essential element of the NIST/EPA/NIH Mass Spectral Library is the software necessary to evaluate and use the library effectively. Past developments include the addition of synonyms for chemical names, the addition of chemical structures and Chemical Abstracts registry numbers, the development and implementation of spectral matching algorithms, the development of a retention index database with values linked to the appropriate spectra, and the development of AMDIS, a program that extracts clean spectra from complex gas chromatography/mass spectrometry data for comparison to library spectra. A variety of other search and analysis software central to the NIST evaluation process has been developed at NIST and is also made available to library users. NIST search and analysis software is available on virtually all relevant commercial mass spectrometry instrumentation. In recent years, all compounds are represented by the IUPAC International Chemical Identifier (InChI), which was developed for this purpose at NIST and now routinely used for general chemical identification worldwide.
A description of the various utilities associated with this library are described here.
RI Database
The RI Database is the largest available collection of retention indices. The data was collected from the published literature and derived along with NIST measurements of EI spectra. As with the NIST/EPA/NIH Mass Spectral Library the software provides chemical names, synonyms, Chemical Abstracts registry numbers, and structures. The two collections are integrated together through NIST software.
The current (2023) version of the EI library contains 394,054 EI spectra of 347,100 compounds. Over the last decade over 10,000 spectra have been added each year. Approximately 5000 copies of the NIST EI library are provided each year to end users through distributors. Most manufacturers of mass spectrometers offer the Database as an option and a large proportion of instruments capable of generating EI spectra are delivered with the Database.
The GC Retention Index Database contains 491,790 retention index values for 180,610 compounds. The retention index values for which there is a corresponding mass spectrum are included in the NIST/EPA/NIH Mass Spectral Library.
A 2023 version of the Tandem (MS/MS) Library of small molecules, contains 2,374,064 spectra of 399,267 precursor ions from 51,501 chemical compounds is available independently from the EI library and is described separately.