Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Software Assurance Reference Dataset (SARD) Manual

[SAMATE Home | IntrO TO SAMATE | SARD | SATE | Bugs Framework | Publications | Tool Survey | Resources]

Preamble

The purpose of the reference dataset is to provide researchers (particularly in the SAMATE Project), software assurance tool developers, and end users with a set of artifacts with known software security errors and fixes for them. The artifacts will be designs, source code, binaries, etc., that is, from all phases of the software lifecycle. The samples include "synthetic" (written to test), collected from "the wild" (production), and academic (from university researchers). This dataset will also contain real, production software applications with known bugs and vulnerabilities. This will allow developers to test their methods and end users to evaluate a tool when considering it. The dataset intends to encompass a wide variety of vulnerabilities, languages, platforms, and compilers. There is more information about the ideas behind the reference dataset in the Software Assurance Reference Dataset philosophy page. To access the set itself, visit https://samate.nist.gov/SARD/.

The dataset is a large effort with around 500 000 test cases. It has benefited from many contributors. The groups of contributions are detailed on the Acknowledgments and Test Case Descriptions page.

Eligibility of a Test Case

Any software artifact with security vulnerabilities is welcome to be submitted. Samples of avoiding or mitigating such vulnerabilities are also welcome. Although we intend to have security errors from the whole software lifecycle, this dataset concentrates on source code for now.

Submit Test Cases

A test case consists of one or more files, which manifest the security error, and metadata about the file(s), such as the weakness type, language, etc. samate [at] nist.gov (Contact us) to submit test cases.

View, Search, and Download Test Cases

Any user can search and download test cases. All the test cases in the SARD are displayed by default, except deprecated test cases. For now, you can download test cases from their individual test cases, or use the API button to get a list of download links for the displayed test cases. The API results are also paginated, so at the moment you need to automate yourself the download of many test cases.

Clicking a Test Case ID displays that test case.

You can search for test cases according to certain search criteria, such as test case id, test case description, language, weakness type, filename, etc.

Test Suite

A test suite is a pre-defined collection of test cases. Anyone can view and download entire test suites.

  • Display/Download

Test suite on the navigation bar will lead to a screen that can display and down test suite.

Review Process

When a test case is first added, its status is "Candidate". After review, the status of a test case could be "Accepted". If a test case needs to be withdrawn, it is marked "Deprecated". It is still available, for historical purposes, but should not be used in any new work. See Test Case Status - What it Means for details of the review process.

Enhancements

The SARD is continuing to evolve. Suggested near-term enhancements include:

Major Subsystems

These are major subsystems to be added, which will require many changes.

  • Enhance SARD architecture to support "complex" test cases, for example,
    1. a single, large case may have many pieces which are updated from time to time. the best we could do now is (a) one huge test case, which is deprecated when a piece changes or (b) a test suite composed of test cases - the user would have to know how to collect them up and combine them to use them.
  • Support Easy Running of Downloaded Test Cases
    • Indicate which file(s) is the argument to the test script
      • Add "target file(s)" field to test case metadata
    • Indicate the expected result (like good or bad, and where)!
      • Populate the Expected Output field in the metadata
      • As above, collect this info in some easily (mechanically) accessible way.
    • The sample tool script should have an option to do all the downloaded target files at once, that is, in one tool invocation.
  • Present all downloaded test cases as one big program. That is, there is one main().
    • "Compile" instructions would have to indicate each subdirectory, or, everything would have to be put in one directory.
    • This could be an option. With this, a source code analyzer would only have to be started once, which may save considerable time.
    • Each test case would have to be a procedure, which could be called by a (synthesized) main().
    • This only applies to source code artifacts, not binaries, designs, etc.

Search Features

  • Add ability to search test cases by:
    • test case id
    • file size
    • full-text search of description - natural language search, Boolean operators.
    • other metadata
  • Optional case-sensitive search
  • Allow Boolean connections between search criteria
  • The SARD documentation lists all search criteria

Other Enhancements

  • Users should be able to download information about all the test suites. (7 May 2007)
    • They can download all test cases, with a manifest, but not the test suites. That manifest, or another manifest, could include all test suites and their information.
  • Statistics of visit for each web page (available through Special pages, Popular pages) and each test case
    • It is partially done. There is statistics about visitors, pages but not really on test cases.
  • Add button to select (or download) all test cases in search or display set. With that user can search, then easily download them all
  • It's possible to upload the same file twice for a given test case.

Internal Enhancements

These are internal to the SARD. They are not visible to users.

  • Document the tests (for regression testing)

History

The collection was created in 2005 and was originally called the Standard Reference Dataset, abbreviated SRD [Black, "Software Assurance And Tool Evaluation", SERP 2005]. The name was quickly changed to Software Reference Dataset to be more specific, then to SAMATE Reference Dataset to be less presumptuous.

In 2013, we were notified that the abbreviation SRD conflicted with the federally legislated Standard Reference Data (SRD). About the same time we came to desire an acronym that was less common for web searches. Several rounds of soliciting suggestions, brain storming, polling, and discussion produced over two dozen possible new names. The new name, Software Assurance Reference Dataset, was suggested by Bertrand Stivalet and was announced at the Static Analysis Tool Exposition (SATE) V Experience Workshop on Friday, 14 March 2014.

Created March 30, 2021, Updated September 27, 2024