[SAMATE Home | IntrO TO SAMATE | SARD | SATE | Bugs Framework | Publications | Tool Survey | Resources]
This page discusses some of the design issues and decisions of the Software Assurance Reference Dataset (SARD). SARD access and its manual are on-line. The SARD changes often. We appreciate and acknowledge those who contributed test cases to the SARD.
The purpose of the SARD is to provide consumers, researchers, and developers with a set of known weaknesses. This will allow consumers to evaluate tools and developers to test their methods. The dataset will encompass a wide variety of flaws, languages, platforms, and compilers. We want the dataset to become a broad effort, with examples contributed by many sources.
Within the SARD are relatively small, explicit test suites designated for specific use, like a minimum benchmark for Web penetration testers.
The dataset will eventually cover all phases of software: from initial concept, through design and implementation, to acceptance, deployment, and operation. Thus, although we talk mostly about source code, the dataset will eventually have models, designs, running programs, etc.
The SARD manual includes bug reports and suggestions for enhancements and improvements.
Conceptually the dataset has three parts.
The SARD acknowledgments and test case description page lists many of the people and groups who contributed test cases. We appreciate them and acknowledge their work and generosity. These test cases represent considerable intellectual effort to reduce reported vulnerabilities to examples, classify them, generate elaborations of particular flaws, come up with corresponding correct examples, etc.
The acknowledgments also have more details about the test cases, their sources, links to paper explaining them, and other information.
The SARD is a huge repository of over 80 000 test cases.
What types of cases should be included?
Many test cases are designed to demonstrate one error, but have additional errors. For instance, 1777 illustrates a hardcoded password, but it also uses gets(), which allows a buffer overflow.
Test cases that are absolutely the cleanest, most excellent code with great style (except for the weakness) minimizes confounding concerns. These are useful for basic research and instruction.
Typical test cases, with style faults, non-portable code, even other weaknesses or compile time errors, most resemble real code.
We believe test cases should at least compile, so they can be executed, and shouldn't have extraneous weaknesses. Test Case Status has details of our review process. Poor style, design, or commenting is not forbidden.
Test suites should encompass a variety of flawed code, but also should have corresponding fixed code for test cases. These are important to test false-positive rates. In many cases there are many different possible solutions, so multiple fixed code is possible.
A "bad" test case contains at least one weakness. The offending line(s) of code is highlighted in the display of a test case. The weakness is designated using Common Weakness Enumeration (CWE) entries. That is a CWE Identification number followed by an associated name, e.g. "CWE-121: Stack Based Buffer Overflow". Currently, the weakness names are based on CWE version 2.1. In case the associated CWE full name is too long to be displayed, a shortened name is used. However, the CWE Identification number is kept intact.
In SARD, the CWE weakness name appears in every place where weakness needs to be posted, e.g. the screen display of a test case, in the list of test cases, the output zipped file of test cases, etc.
CWE weakness name is also used as search criteria to find test cases. The user can fetch all the test cases with a specific weakness by using the search criteria.
Who can change test cases or test suites? When? Why?
To have long term value, the content of a test case is "write once". That is, once source code or a binary is added to the SARD, it keeps the same name and never changes. This permanence allows research work to refer to, say SARD test case 1552, knowing that that exact code can always be retrieved. Later work could reliably get exactly what was used before.
What if there is a mistake in the code, for instance, there is a second, unintended weakness? The test case can be marked Deprecated and a reference made to a corrected version. Deprecated test cases should not be used for new work. They remain in the SARD as a reference to recheck old work.
The metadata associated with a test case could change. For instance, the description could be expanded or corrected. It may be useful to have a history of such changes so users can see if metadata has changed, what the changes are, and who did them.
Test suites are similarly "write once". Once they are designated, they should not change. A test suite might be superseded by an improved test suite, which refers to test cases conforming to the latest language standard or has better coverage.
The presumption is that test cases and test suites are examined by their authors before being submitted. If there is substantial doubt about the suitability or correctness of a test case, it should be resolved before submission. Test cases should be deprecated rarely. On the other hand, descriptive data about test cases, the "metadata", need not be vetted quit as rigorously.
A test suite is a set of test cases explicitly picked for some purpose. For instance, Test Suite #45 consists of 77 cases to test a "source code security analyzer based on functional requirements SCA-RM-1 through SCAN-RM-5 specified in "Source Code Security Analysis Tool Functional Specification"
Using test suites in the SARD allows different people to pick different sets for different reasons. Test suite designers should consider many questions, for instance:
What does it mean when an SARD test case is labeled "candidate"? What is the quality of that test case? Has it been reviewed or vetted in any way? What constitutes an "accepted" test case? And what if a test case is found to be incorrect or of poor quality? We provide information explaining what the status tag assigned to each test case tells you.
NIST is developing Computer Forensic Reference Data Sets (CFReDS) for digital evidence. These reference data sets provide an investigator documented sets of simulated digital evidence for examination.
Statistical Reference Datasets (StRD) are "reference datasets with certified values for a variety of statistical methods."
Other datasets, handbooks, and reference material are available to help "by mathematical modeling, design of methods, transformation of these methods into efficient numerical algorithms for high-performance computers and the implementation of these methods into high-quality mathematical software."
SAMATE is compiling a list of other assurance tool test suites and benchmarks.