[SAMATE Home | IntrO TO SAMATE | SARD | SATE | Bugs Framework | Publications | Tool Survey | Resources]
The purpose of the reference dataset is to provide researchers (particularly in the SAMATE Project), software assurance tool developers, and end users with a set of artifacts with known software security errors and fixes for them. The artifacts will be designs, source code, binaries, etc., that is, from all phases of the software lifecycle. The samples include "synthetic" (written to test), collected from "the wild" (production), and academic (from university researchers). This dataset will also contain real, production software applications with known bugs and vulnerabilities. This will allow developers to test their methods and end users to evaluate a tool when considering it. The dataset intends to encompass a wide variety of vulnerabilities, languages, platforms, and compilers. There is more information about the ideas behind the reference dataset in the Software Assurance Reference Dataset philosophy page. To access the set itself, visit https://samate.nist.gov/SARD/.
The dataset is a large effort with around 500 000 test cases. It has benefited from many contributors. The groups of contributions are detailed on the Acknowledgments and Test Case Descriptions page.
Any software artifact with security vulnerabilities is welcome to be submitted. Samples of avoiding or mitigating such vulnerabilities are also welcome. Although we intend to have security errors from the whole software lifecycle, this dataset concentrates on source code for now.
A test case consists of one or more files, which manifest the security error, and metadata about the file(s), such as the weakness type, language, etc. samate [at] nist.gov (Contact us) to submit test cases.
Any user can search and download test cases. All the test cases in the SARD are displayed by default, except deprecated test cases. For now, you can download test cases from their individual test cases, or use the API button to get a list of download links for the displayed test cases. The API results are also paginated, so at the moment you need to automate yourself the download of many test cases.
Clicking a Test Case ID displays that test case.
You can search for test cases according to certain search criteria, such as test case id, test case description, language, weakness type, filename, etc.
A test suite is a pre-defined collection of test cases. Anyone can view and download entire test suites.
Test suite on the navigation bar will lead to a screen that can display and down test suite.
When a test case is first added, its status is "Candidate". After review, the status of a test case could be "Accepted". If a test case needs to be withdrawn, it is marked "Deprecated". It is still available, for historical purposes, but should not be used in any new work. See Test Case Status - What it Means for details of the review process.
The SARD is continuing to evolve. Suggested near-term enhancements include:
These are major subsystems to be added, which will require many changes.
These are internal to the SARD. They are not visible to users.
The collection was created in 2005 and was originally called the Standard Reference Dataset, abbreviated SRD [Black, "Software Assurance And Tool Evaluation", SERP 2005]. The name was quickly changed to Software Reference Dataset to be more specific, then to SAMATE Reference Dataset to be less presumptuous.
In 2013, we were notified that the abbreviation SRD conflicted with the federally legislated Standard Reference Data (SRD). About the same time we came to desire an acronym that was less common for web searches. Several rounds of soliciting suggestions, brain storming, polling, and discussion produced over two dozen possible new names. The new name, Software Assurance Reference Dataset, was suggested by Bertrand Stivalet and was announced at the Static Analysis Tool Exposition (SATE) V Experience Workshop on Friday, 14 March 2014.