Human Factors Test Suite Version 2.0-2 for the Usability, Accessibility, and Privacy Requirements of the VVSG-NI, Version 2.0 Human Factors Test Suite Version 2.0-2 for the Usability, Accessibility, and Privacy Requirements of the VVSG-NI, Version 2.0 Introduction The purpose of this document is to describe specific test methods for all the Usability and Accessibility requirements within the VVSG (Part 1, Chapter 3). For each such requirement, there are instructions for how a test lab (or any other testing agent) should go about determining whether or not the voting system under test (VSUT) meets that requirement. (Of course, as with all conformance testing, one cannot be certain that a given system meets the requirements in all circumstances, only that the VSUT is successful under the particular conditions actually tested. However, a failed test does constitute proof that the VSUT does not meet the requirement.) Intro.1 Background of VVSG Testing By authorization of the 2002 Help America Vote Act (HAVA), NIST is assisting the Election Assistance Commission (EAC) with the implementation of Voluntary Voting System Guidelines (VVSG) for states and local governments conducting Federal elections. The EAC’s Technical Guidelines Development Committee (TGDC) in collaboration with NIST researchers has developed a draft of the next iteration of the VVSG. The draft document is a set of detailed technical requirements addressing core requirements, human factors, privacy, security, and transparency of the next generation of voting systems. The EAC plans to issue the next VVSG after receiving and reviewing public comments. NIST is developing a set of uniform public test suites to be used as part of the EAC’s Testing and Certification Program. Test Labs will be able to use these freely available test suites to help determine that VVSG requirements are met by voting systems. The test suites address human factors, security and core functionality requirements for voting systems as specified in the VVSG. Use of the public test suites will produce consistent results and promote transparency of the testing process. The test suites can also assist manufacturers in the development of conforming products by providing precise test specifications. Also, they will help reduce the cost of testing since each test lab would no longer need to develop its own test suites. Finally, a uniform set of public test suites can increase election officials’ and voters’ confidence that voting systems conform to VVSG requirements. Intro.2 Structure of this Document Following this introductory section, Part 1 lists each requirement number and title, followed by either a Single Requirement Test Method (SRTM) or a Combined Requirement Test Method (CRTM). An SRTM directly describes the way in which its requirement is to be tested. However, it is often much easier to test a group of requirements together in a single test scenario. In those cases, the requirements within the group all have a link pointing to the CRTM that they share. Part 2 of this document describes those CRTMs. Each CRTM description includes a list of the requirements it covers. In a few cases, a requirement may be tested in more than one CRTM; if so, it will have a list of links to point to all the relevant CRTMs. Thus, there is a many-to-many relation between the requirements and CRTMs. Throughout this document, we use "test method" to mean either an SRTM or a CRTM. If you are using the interactive HTML version of this document, here are the navigation rules: A link that is a requirement number (e.g. "3.2.2-A") will jump to the entry in Part 1 for that requirement. A link that is a requirement title (e.g. "Ballot Editing per Contest") will jump to the entry for that requirement in the VVSG itself, either in a separate window tab or a separate window (depending on your browser settings). A link that is a CRTM title (e.g. "Editable Ballot Session") will jump to the entry in Part 2 for that CRTM. Also, notice that there is a table of contents located at the beginning of both Part 1 and Part 2. Intro.3 Tester Qualifications All the tests require general familiarity with voting systems and procedures, with conformance testing, with the requirements of the VVSG, and with usability and human factors. Certain tests also require special expertise (such as the operation of technical equipment). When such expertise is called for, it will be noted explicitly within the test method. Furthermore, since many of the tests involve the tester acting as a voter, the tester should have no serious perceptual or cognitive disabilities, and must be fully literate in English. In particular, the tester's corrected vision must be no worse than 20/40. Intro.4 Measurement Equipment Needed Various tests require certain kinds of technical equipment. Among these are: Small ruler 15x magnifier Tape measure Stopwatch Level Oscilliscope Photometer A broadband instrument for measuring sound volume, using an A-weighting filter (as per IEEE 269) Force gauge Intro.5 General Rules and Background Assumptions for Testing Intro.5.1 Rules for All VVSG Testing The following principles apply to all of the VVSG tests. Read the VVSG: The full wording of requirements and accompanying discussions is not repeated herein. It is assumed that as the test lab proceeds through the test procedures, it is consulting the official VVSG text of the requirements being addressed. The test procedures cannot be correctly understood in isolation from the underlying VVSG. Use of Judgment: Although the purpose of this document is to lay out defined and repeatable procedures for testing a voting system against the VVSG, the task of determining conformance is not one that can always be done "mechanically". The tester may need to apply reasoned judgment when performing the testing, taking into account the general meaning and purpose of the requirement under test. Significant Difficulty Many of the VVSG requirements stipulate that voters must be able to perform certain functions. This does not always provide an unambiguous, "bright-line" test. Herein, we adopt the notion of "significant difficulty". The features provided by the system need not provide an effortless experience - e.g. the voter may have to figure out how to write in a candidate or change a selection - but the feature provided must not be excessively clumsy or complex for the voter. Use of "Applies to" Clause: The "Applies to" clause of each requirement also governs which test methods are to be executed. E.g. if a requirement applies only to VEBD-A systems, then the corresponding test shall be executed if and only if the VSUT is a VEBD-A system (i.e. has an editable ballot and audio interface). Serendipitous Detection of Failure: Although each test method is designed for specific requirements, it may also reveal violations of other requirements. These violations are to be noted by the tester and are counted as failures, just as if they had been the explicit purpose of the test. Abandoning a Test Method: Some of the test methods have later tests that are dependent on earlier parts of the sequence. In general, the test lab should proceed through as much of the test method as is practical so as to check the system thoroughly. But if a failure early in the test method renders the rest of the scenario meaningless, then it may be abandoned, as long as the reasons are documented. Order of Tests: Although the tests as presented in this document generally follow the order of requirements in the VVSG, they may be performed in any order. Requirements and Recommendations: Test methods are specified for both mandatory ("shall") and optional ("should") requirements. The test method defines the conditions under which the system fails the requirement, but of course failure to implement an optional requirement does not prevent a system from conforming. Documentation of Failure Conditions: When the test lab determines that the system fails a given requirement, it shall document the precise conditions under which failure was detected. System Deployed as Intended: Unless otherwise stated, the test lab examines and operates the system as deployed according to the instructions of the manufacturer. Intro.5.2 Rules Specifically for Usability Testing The following principles apply to all the Usability tests. Pass/Fail Criteria: Each test method (with a few exceptions) contains one or more pass/fail criteria. These are explicit statements about the conditions under which the system being tested passes or fails. They are marked with a => icon throughout this document. This icon is preceded by a "P" in the case of "pass" conditions, an "F" for "fail" conditions, and "PF" for "pass/fail" conditions. Since each SRTM applies only to the one requirement under which it is listed, it is implicit that the system is passing or failing that requirement. CRTMs, on the other hand, are used to test several requirements, and so, within a CRTM, the pass/fail criterion will also identify the requirement being passed or failed. Implicit Passing: Many test methods include a number of steps for each of which the system must perform correctly, or it fails. In general, it is easier to confirm that a system has not met a requirement, than that it has. If the test method is completed successfully without any failures, then the system passes. Adequacy of Messages to the Voter: There are many requirements in which a "warning" or "notification" or "indication" must be issued to the voter. In general, these do not prescribe when the information is issued (e.g. as a particular vote is attempted, or during a final review) nor the precise format (visual or audio) and content of the warning. Note especially that in the case of manually marked paper ballots, some voter information may be posted within the voting booth, rather than on the ballot itself. The test lab must determine whether the behavior of the system constitutes a conspicuous, specific, and informative message, such as would be adequate for the voter. Review by Two Experts: When the test method involves expert review of the VSUT, the review is to be carried out by two experts. This is to improve both the thoroughness and objectivity of the test. Access to CVR: In order to perform some tests, the test lab must have access to the electronic Cast Vote Record (CVR). The VVSG requires that voting systems retain records of individual ballots (see Part 1, section 4.3.2 XREF). The test lab must determine (either from system documentation or from the manufacturer) how to gain such access. Test Method Dependence on System Class: There are some requirements that, while applying to all voting systems, may be met in various ways, depending on the type of system. In particular, within a single requirement there may be a different test method for VEBD systems than for non-VEBD systems. When this occurs, the scope of each test method will be described explicitly. Audio Interface: Some tests have to be performed twice, once using the visual interface, and then again using the audio interface (if available). Note that the accessible voting station (class Acc-VS) is a subset of editable systems with audio (class VEBD-A). Therefore any test that applies to VEBD-A systems also applies to Acc-VS systems. Degree of Parallelism: Many of the CRTMs call for the test lab to enact a voting session, and, during the session, to check certain features of the system for conformance. The features to be checked in parallel normally form a closely related group (e.g. font characteristics or use of color). The idea is to allow the tester to concentrate on one topic at a time. In theory, some of these sessions could be combined, thereby saving testing time. The test lab is free to adopt this approach if desired; but the testers should be aware that they then have to be careful to check all the relevant system characteristics during that one session. Use of Standard Test Ballot: Unless otherwise stated, the test lab examines and operates the system using a ballot design that implements the NIST standard test ballot specification. The manufacturer is responsible for implementing the specification on the VSUT. Default Ballot Choices: Unless otherwise stated, when the test involves going through a voting session and filling out a ballot, the tester shall make the choices described in the following table. Note that these choices represent a completely filled-out ballot (no undervoting). Contest Choice (Candidate / Party) Contest #0: Straight Party Vote Option #0.2: Yellow Contest #1: President and Vice-President of the United States Candidate #1.3: Daniel Court and Amy Blumhardt / Purpley Contest #2: US Senate Candidate #2.2: Lloyd Garriss / Yellow Contest #3: US Representative Candidate #3.1: Brad Plunkard / Blue Contest #4: Governor Candidate #4.30: David Davis / Independent Contest #5: Lieutenant-Governor Candidate #5.6: Burt Zirkle / Gold Contest #6: Registrar of Deeds Candidate #6.1: Laila Shamsi / Yellow Contest #7: State Senator Candidate #7.2: Marty Talarico / Yellow Contest #8: State Assemblyman Candidate #8.1: Andrea Solis / Blue Contest #9: County Commissioners Candidate #9.2: Chloe Witherspoon / Blue Candidate #9.3: Clayton Bainbridge / Blue Candidate #9.4: Amanda Marracini / Yellow Candidate #9.7: Sheila Moskowitz / Purple Write in "Camille Volpe" as the 5th choice Contest #10: Court of Appeals Judge Candidate #10.1: Michael Marchesani Contest #11: Water Commissioners Candidate #11.1: Orville White / Blue Candidate #11.2: Gregory Seldon / Yellow Contest #12: City Council Candidate #12.2: Randall Rupp / Blue Candidate #12.3: Carroll Shry / Blue Candidate #12.4: Beverly Barker / Yellow Candidate #12.7: Reid Feister / Yellow Retention Question #1: Yes Retention Question #2: No Referendum #1: PROPOSED CONSTITUTIONAL AMENDMENT C No Referendum #2: PROPOSED CONSTITUTIONAL AMENDMENT D Yes Referendum #3: PROPOSED CONSTITUTIONAL AMENDMENT H Yes Referendum #4: PROPOSED CONSTITUTIONAL AMENDMENT K No Referendum #5: BALLOT MEASURE 101: Open Primaries No Referendum #6: BALLOT MEASURE 106: Limits on Private Enforcement of Unfair Business Competition Laws No Part 1: Usability and Accessibility Requirements Table of Contents: 3 Usability, Accessibility, and Privacy Requirements 3.1 Overview 3.1.1 Purpose 3.1.2 Special Terminology 3.1.3 Interaction of Usability and Accessibility Requirements 3.2 General Usability Requirements 3.2.1 Performance Requirements 3.2.1.1 Overall Performance Metrics 3.2.1.1-A : Total Completion Performance 3.2.1.1-B : Perfect Ballot Performance 3.2.1.1-C : Voter Inclusion Performance 3.2.1.1-D : Usability metrics from the Voting Performance Protocol 3.2.1.1-D.1 : Effectiveness metrics for usability 3.2.1.1-D.2 : Voting session time 3.2.1.1-D.3 : Average voter confidence 3.2.1.2 Manufacturer Testing 3.2.1.2-A : Usability Testing by Manufacturer for General Population 3.2.2 Functional Capabilities 3.2.2-A : Notification of Effect of Overvoting 3.2.2-B : Undervoting to be Permitted 3.2.2-C : Correction of Ballot 3.2.2-D : Notification of Ballot Casting 3.2.2.1 Editable Interfaces 3.2.2.1-A : Prevention of Overvotes 3.2.2.1-B : Warning of Undervotes 3.2.2.1-C : Independent Correction of Ballot 3.2.2.1-D : Ballot Editing per Contest 3.2.2.1-E : Contest Navigation 3.2.2.1-F : Notification of ballot casting failure (DRE) 3.2.2.2 Non-Editable Interfaces 3.2.2.2-A : Notification of Overvoting 3.2.2.2-B : Notification of Undervoting 3.2.2.2-C : Notification of Blank Ballots 3.2.2.2-D : Ballot Correction or Submission Following Notification 3.2.2.2-E : Handling of Marginal Marks 3.2.2.2-F : Notification of ballot casting failure (PCOS) 3.2.3 Privacy 3.2.3.1 Privacy at the Polls 3.2.3.1-A : System Support of Privacy 3.2.3.1-A.1 : Visual Privacy 3.2.3.1-A.2 : Auditory Privacy 3.2.3.1-A.3 : Privacy of Warnings 3.2.3.1-A.4 : No Receipts 3.2.3.2 No Recording of Alternative Format Usage 3.2.3.2-A : No Recording of Alternative Languages 3.2.3.2-B : No Recording of Accessibility Features 3.2.4 Cognitive Issues 3.2.4-A : Completeness of Instructions 3.2.4-B : Availability of Assistance from the System 3.2.4-C : Plain Language 3.2.4-C.1 : Clarity of Warnings 3.2.4-C.2 : Context before Action 3.2.4-C.3 : Simple Vocabulary 3.2.4-C.4 : Start Each Instruction on a New Line 3.2.4-C.5 : Use of Positive 3.2.4-C.6 : Use of Imperative Voice 3.2.4-C.7 : Gender-based Pronouns 3.2.4-D : No Bias among Choices 3.2.4-E : Ballot Design 3.2.4-E.1 : Contests Split among Pages or Columns 3.2.4-E.2 : Indicate Maximum Number of Candidates 3.2.4-E.3 : Consistent Representation of Candidate Selection 3.2.4-E.4 : Placement of Instructions 3.2.4-F : Conventional Use of Color 3.2.4-G : Icons and Language 3.2.5 Perceptual Issues 3.2.5-A : Screen Flicker 3.2.5-B : Resetting of Adjustable Aspects at End of Session 3.2.5-C : Ability to Reset to Default Values 3.2.5-D : Minimum Font Size 3.2.5-E : Available Font Sizes 3.2.5-F : Use of Sans Serif Font 3.2.5-G : Legibility of Paper Ballots and Verification Records 3.2.5-G.1 : Legibility via Font Size 3.2.5-G.2 : Legibility via Magnification 3.2.5-H : Contrast Ratio 3.2.5-I : High Contrast for Electronic Displays 3.2.5-J : Accommodation for Color Blindness 3.2.5-K : No Reliance Solely on Color 3.2.6 Interaction Issues 3.2.6-A : No Page Scrolling 3.2.6-B : Unambiguous Feedback for Voter's Selection 3.2.6-C : Accidental Activation 3.2.6-C.1 : Size and Separation of Touch Areas 3.2.6-C.2 : No Repeating Keys 3.2.6.1 Timing Issues 3.2.6.1-A : Maximum Initial System Response Time 3.2.6.1-B : Maximum Completed System Response Time for Vote Confirmation 3.2.6.1-C : Maximum Completed System Response Time for All Operations 3.2.6.1-D : System Response Indicator 3.2.6.1-E : Voter Inactivity Time 3.2.6.1-F : Alert Time 3.2.7 Alternative Languages 3.2.7-A : General Support for Alternative Languages 3.2.7-A.1 : Voter Control of Language 3.2.7-A.2 : Complete Information in Alternative Language 3.2.7-A.3 : Auditability of Records for English Readers 3.2.7-A.4 : Usability Testing by Manufacturer for Alternative Languages 3.2.8 Usability for Poll Workers 3.2.8-A : Clarity of System Messages for Poll Workers 3.2.8.1 Operation 3.2.8.1-A : Ease of Normal Operation 3.2.8.1-B : Usability Testing by Manufacturer for Poll Workers 3.2.8.1-C : Documentation usability 3.2.8.1-C.1 : Poll Workers as target audience 3.2.8.1-C.2 : Usability at the polling place 3.2.8.1-C.3 : Enabling verification of correct operation 3.2.8.2 Safety 3.2.8.2-A : Safety Certification 3.3 Accessibility Requirements 3.3.1 General 3.3.1-A : Accessibility throughout the Voting Session 3.3.1-A.1 : Documentation of Accessibility Procedures 3.3.1-B : Complete Information in Alternative Formats 3.3.1-C : No Dependence on Personal Assistive Technology 3.3.1-D : Secondary Means of Voter Identification 3.3.1-E : Accessibility of Paper-based Vote Verification 3.3.1-E.1 : Audio Readback for Paper-based Vote Verification 3.3.2 Low Vision 3.3.2-A : Usability Testing by Manufacturer for Voters with Low Vision 3.3.2-B : Adjustable Saturation for Color Displays 3.3.2-C : Distinctive Buttons and Controls 3.3.2-D : Synchronized Audio and Video 3.3.3 Blindness 3.3.3-A : Usability Testing by Manufacturer for Blind Voters 3.3.3-B : Audio-Tactile Interface 3.3.3-B.1 : Equivalent Functionality of ATI 3.3.3-B.2 : ATI Supports Repetition 3.3.3-B.3 : ATI Supports Pause and Resume 3.3.3-B.4 : ATI Supports Transition to Next or Previous Contest 3.3.3-B.5 : ATI Can Skip Referendum Wording 3.3.3-C : Audio Features and Characteristics 3.3.3-C.1 : Standard Connector 3.3.3-C.2 : T-coil Coupling 3.3.3-C.3 : Sanitized Headphone or Handset 3.3.3-C.4 : Initial Volume 3.3.3-C.5 : Range of Volume 3.3.3-C.6 : Range of Frequency 3.3.3-C.7 : Intelligible Audio 3.3.3-C.8 : Control of Speed 3.3.3-D : Ballot Activation 3.3.3-E : Ballot Submission and Vote Verification 3.3.3-F : Tactile Discernability of Controls 3.3.3-G : Discernability of Key Status 3.3.4 Dexterity 3.3.4-A : Usability Testing by Manufacturer for Voters with Dexterity Disabilities 3.3.4-B : Support for Non-Manual Input 3.3.4-C : Ballot Submission and Vote Verification 3.3.4-D : Manipulability of Controls 3.3.4-E : No Dependence on Direct Bodily Contact 3.3.5 Mobility 3.3.5-A : Clear Floor Space 3.3.5-B : Allowance for Assistant 3.3.5-C : Visibility of Displays and Controls 3.3.5.1 Controls within Reach 3.3.5.1-A : Forward Approach, No Obstruction 3.3.5.1-B : Forward Approach, with Obstruction 3.3.5.1-B.1 : Maximum Size of Obstruction 3.3.5.1-B.2 : Maximum High Reach over Obstruction 3.3.5.1-B.3 : Toe Clearance under Obstruction 3.3.5.1-B.4 : Knee Clearance under Obstruction 3.3.5.1-C : Parallel Approach, No Obstruction 3.3.5.1-D : Parallel Approach, with Obstruction 3.3.5.1-D.1 : Maximum Size of Obstruction 3.3.5.1-D.2 : Maximum High Reach over Obstruction 3.3.6 Hearing 3.3.6-A : Reference to Audio Requirements 3.3.6-B : Visual Redundancy for Sound Cues 3.3.6-C : No Electromagnetic Interference with Hearing Devices 3.3.7 Cognition 3.3.7-A : General Support for Cognitive Disabilities 3.3.8 English Proficiency 3.3.8-A : Use of ATI 3.3.9 Speech 3.3.9-A : Speech not to be Required by Equipment 3.2 General Usability Requirements 3.2.1 Performance Requirements 3.2.1.1 Overall Performance Metrics 3.2.1.1-A Total Completion Performance Test Method: Voting Performance Protocol (VPP) 3.2.1.1-B Perfect Ballot Performance Test Method: Voting Performance Protocol (VPP) 3.2.1.1-C Voter Inclusion Performance Test Method: Voting Performance Protocol (VPP) 3.2.1.1-D Usability metrics from the Voting Performance Protocol Test Method: Voting Performance Protocol (VPP) 3.2.1.1-D.1 Effectiveness metrics for usability Test Method: Voting Performance Protocol (VPP) 3.2.1.1-D.2 Voting session time Test Method: Voting Performance Protocol (VPP) 3.2.1.1-D.3 Average voter confidence Test Method: Voting Performance Protocol (VPP) 3.2.1.2 Manufacturer Testing 3.2.1.2-A Usability Testing by Manufacturer for General Population Test Method: Usability Testing by Manufacturer 3.2.2 Functional Capabilities 3.2.2-A Notification of Effect of Overvoting Test Method: If the system is a VEBD type, this requirement is covered under XREF 3.2.2.1-A Prevention of Overvotes. If the system is a PCOS type, this requirement is covered under XREF 3.2.2.2-A Notification of Overvoting. If the system is one with a MMPB and no immediate feedback to the voter (such as with central count systems), the tester shall inspect the system and verify that notification is readily available to the voter. For example, this may be achieved by posting the notification within a voting booth or stall, or by including the notification directly on the paper ballot. For types of systems other than those mentioned above, the tester shall verify that notification is given in a way that is appropriate for the system. PF => If adequate notification on the effect of overvoting is readily available to the voter, then the system passes, otherwise it fails. 3.2.2-B Undervoting to be Permitted Test Method: The tester shall fill out the ballot using the default ballot choices, except that 1) no party is chosen in race #0 (straight party vote), 2) no candidate is chosen in race #5 (Lieutenant-Governor), and 3) in race #9 (county commissioners), only Candidate #9.3 (Clayton Bainbridge / Blue) and Candidate #9.4 (Amanda Marracini / Yellow) are chosen. The tester shall then attempt to cast the ballot. The system must allow this to be submitted as a valid ballot (whether or not a warning is issued). PF => If the system accepts the undervoted ballot, then the system passes, otherwise it fails. 3.2.2-C Correction of Ballot Test Method: For VEBD systems, use the test method described below under XREF 3.2.2.1-C "Independent Correction of Ballot". For non-VEBD systems, the tester shall verify that instructions on how to correct a ballot are readily available to the voter. For example, this may be achieved by posting the instructions within a voting booth or stall, or by including the instructions directly on the ballot. PF => If instructions for correcting the ballot are readily available to the voter, then the system passes, otherwise it fails. Since the actual correction of a non-VEBD ballot typically depends on procedures extraneous to the actual equipment (such as getting a new paper ballot from a poll worker), the correction process itself is not tested. 3.2.2-D Notification of Ballot Casting Test Method: Editable Ballot Session , Non-Editable Ballot Session 3.2.2.1 Editable Interfaces 3.2.2.1-A Prevention of Overvotes Test Method: Editable Ballot Session 3.2.2.1-B Warning of Undervotes Test Method: Editable Ballot Session 3.2.2.1-C Independent Correction of Ballot Test Method: Editable Ballot Session 3.2.2.1-D Ballot Editing per Contest Test Method: Editable Ballot Session 3.2.2.1-E Contest Navigation Test Method: Editable Ballot Session 3.2.2.1-F Notification of ballot casting failure (DRE) Test Method: Since this requirement takes effect only in the case of equipment failure, it cannot be tested by deliberately setting up the precondition, as with other requirements. Rather the requirement is to be tested opportunistically: if, during any of the other testing procedures (whether or not usability-related), equipment failure for ballot casting is detected, the tester must determine a) whether or not the current ballot was recorded and b) whether an adequate notification to the voter was issued. PF => If a correct and adequate notification is issued, then the system passes, otherwise it fails. 3.2.2.2 Non-Editable Interfaces 3.2.2.2-A Notification of Overvoting Test Method: Non-Editable Ballot Session 3.2.2.2-B Notification of Undervoting Test Method: Non-Editable Ballot Session 3.2.2.2-C Notification of Blank Ballots Test Method: The tester shall first enable the system for warning about blank ballots. He shall then submit paper ballots with the following characteristics (if the system does not accept two-sided ballots, then skip those cases): Ballot Correct Result Two-sided ballot, completely blank Warning Two-sided ballot, with a single vote on each side No Warning Two-sided ballot, blank front side, single vote on backWarning Two-sided ballot, single vote on front, blank back side Warning One-sided ballot, blank Warning One-sided ballot, with a single vote No Warning F => If the result in any of these cases is incorrect, then the system fails. Next, the tester shall disable the system for warning about blank ballots. He shall then re-submit the test ballots, as above. PF => If the system accepts all the ballots without warning, then the system passes, otherwise it fails. 3.2.2.2-D Ballot Correction or Submission Following Notification Test Method: Non-Editable Ballot Session 3.2.2.2-E Handling of Marginal Marks Test Method: The tester shall fill out the ballot using the default ballot choices, except that in contest #1, vote for ticket #1.4 (Boone and Lian) for President/VP and also make a marginal mark (as per manufacturer specifications) for ticket #1.7 (Harp and Gray). In contest #2, vote for none, but make a marginal mark for candidate #2.4 (Hewetson). When the ballot is submitted, the system must detect, identify, and warn about both marginal marks. F => If either marginal mark is not detected, identified and warned about, then the system fails. There may also be a warning about overvoting in contest #1 or undervoting in contest #2, but this is not mandatory. The tester should then fix the marginal mark in contest #2 so as to make it a valid vote, but leave contest #1 as is, and then re-submit the ballot. Again, the system must detect, identify, and warn about the remaining marginal mark in contest #1. F => If the remaining marginal mark is not detected, identified and warned about, then the system fails. F => If the system warns about any marginal mark other than as specified above, then the system fails. 3.2.2.2-F Notification of ballot casting failure (PCOS) Test Method: Since this requirement takes effect only in the case of equipment failure, it cannot be tested by deliberately setting up the precondition, as with other requirements. Rather the requirement is to be tested opportunistically: if, during any of the other testing procedures (whether or not usability-related), equipment failure for ballot casting is detected (including failure to read the ballot or to transport it into the ballot box), the tester must determine a) whether or not the current ballot was recorded and b) whether an adequate notification to the voter was issued. PF => If a correct and adequate notification is issued, then the system passes, otherwise it fails. 3.2.3 Privacy 3.2.3.1 Privacy at the Polls 3.2.3.1-A System Support of Privacy Test Method: Privacy of Voting Session 3.2.3.1-A.1 Visual Privacy Test Method: Privacy of Voting Session 3.2.3.1-A.2 Auditory Privacy Test Method: Privacy of Voting Session 3.2.3.1-A.3 Privacy of Warnings Test Method: Privacy of Voting Session 3.2.3.1-A.4 No Receipts Test Method: Privacy of Voting Session 3.2.3.2 No Recording of Alternative Format Usage 3.2.3.2-A No Recording of Alternative Languages Test Method: Privacy of Cast Vote Record (CVR) 3.2.3.2-B No Recording of Accessibility Features Test Method: Privacy of Cast Vote Record (CVR) 3.2.4 Cognitive Issues 3.2.4-A Completeness of Instructions Test Method: The tester shall proceed through an entire voting session, fill out the ballot using the default ballot choices, and check for the presence of instructions for all the functions supported by the system, especially including: System activation and/or session initiation (e.g. use of an activation card). Adjustment of visual display characteristics (e.g. font size, color, contrast) Adjustment of audio characteristics (e.g. volume, speed) Use of other auxiliary devices, such as a magnifier for paper records Mechanism for non-manual input Navigating back and forth through multiple pages Changing a vote Writing in a candidate for office Review of the ballot Final casting of the ballot Not all the above functions are mandatory - but if present, the system must explain how they are to be used. The tester should attempt to discover and exercise any and all such functions provided by the system. Note that the system must provide instructions for all its operations, even if some of those are beyond what is mandated by the VVSG. PF => If adequate instructions are available for all voter operations, then the system passes, otherwise it fails. 3.2.4-B Availability of Assistance from the System Test Method: For VEBD systems, the tester shall proceed through an entire voting session, using the editable ballot session. Confirm that help is available from the system at these points within the session: prior to voting for any of the candidate, immediately after voting for Governor when viewing the session review screen (if any) just before final casting of the ballot If the system under test is an Acc-VS, the above test method must be enacted for both the visual and audio interface. PF => If assistance is available at all the points designated above, then the system passes, otherwise it fails. For non-VEBD systems, the tester need not enact a voting session. Rather, confirm that written instructions or some other built-in mechanism would be readily available to the voter throughout the voting session. Possibilities for presenting assistance include a poster, information on the ballot itself, or an independent electronic "help" system. PF => If assistance is readily available to the voter at any time during the voting session, then the system passes, otherwise it fails. 3.2.4-C Plain Language Test Method: Language Clarity 3.2.4-C.1 Clarity of Warnings Test Method: Language Clarity 3.2.4-C.2 Context before Action Test Method: Language Clarity 3.2.4-C.3 Simple Vocabulary Test Method: Language Clarity 3.2.4-C.4 Start Each Instruction on a New Line Test Method: Language Clarity 3.2.4-C.5 Use of Positive Test Method: Language Clarity 3.2.4-C.6 Use of Imperative Voice Test Method: Language Clarity 3.2.4-C.7 Gender-based Pronouns Test Method: Language Clarity 3.2.4-D No Bias among Choices Test Method: The tester shall inspect the ballot and confirm that all candidates and other ballot choices are presented in a fair and equivalent manner. Characteristics such as font size or voice volume and speed must be the same for all choices. For VEBD systems, use the editable ballot session below. If the system under test is an Acc-VS, the test method must be enacted for both the visual and audio interface. For non-VEBD systems, the tester need not enact a voting session. Rather, confirm that all choices are presented in an equivalent manner with respect to visual appearance, font size, layout, and the like. PF => If all choices are presented without bias, then the system passes, otherwise it fails. 3.2.4-E Ballot Design Test Method: Ballot Design 3.2.4-E.1 Contests Split among Pages or Columns Test Method: Ballot Design 3.2.4-E.2 Indicate Maximum Number of Candidates Test Method: Ballot Design 3.2.4-E.3 Consistent Representation of Candidate Selection Test Method: Ballot Design 3.2.4-E.4 Placement of Instructions Test Method: Ballot Design 3.2.4-F Conventional Use of Color Test Method: Ballot Design 3.2.4-G Icons and Language Test Method: Ballot Design 3.2.5 Perceptual Issues 3.2.5-A Screen Flicker Test Method: The tester shall proceed through the voting session until the first contest (straight party) is displayed. If there is a blinking visual element on this page, proceed through the session until a page without a blinking element is displayed. The measurement is to be taken in a dark room environment. The tester shall use a photometer with an oscilloscope attached to the photometer's output. The flicker rate is then measured from the oscilloscope’s waveform display. Equipment shall have an accuracy of at least ± 1 cd/m2 for light measurements and at least 1kHz bandwidth with a 1 second sweep range for time base measurements. Since the flicker rate is expected to be constant, meters capable of frequency and duty cycle measurements can also be used for this test. If so, the equipment shall have an accuracy of at least ± 0.1 Hz for frequency measurements, and at least ± 2% for duty cycle measurements. FP => If the measured flicker rate is within the 2-55 Hz range, then the system fails, otherwise it passes. 3.2.5-B Resetting of Adjustable Aspects at End of Session Test Method: Default Characteristics 3.2.5-C Ability to Reset to Default Values Test Method: Default Characteristics 3.2.5-D Minimum Font Size Test Method: Font Characteristics 3.2.5-E Available Font Sizes Test Method: The tester shall select a font size between 3mm and 4mm and then proceed through the voting session, using the default ballot choices. F => If no such font size is available for selection, then the system fails. On each page, the tester shall measure (using a 15x magnifier) the height of capital letters in the smallest text intended for the voter. F => If any of these letters has a height less than 3.0mm or greater than 4.0mm, then the system fails. After voting for US Representative (Contest #3), the tester shall then select a larger font size between 6.3 and 9.0mm. F => If the font size cannot be changed at this point, then the system fails. F => If the larger font size is not available for selection, then the system fails. The tester shall proceed through contest #6, again measuring the height of capital letters in the smallest text intended for the voter. F => If any of these letters has a height less than 6.3mm or greater than 9.0mm, then the system fails. The tester shall then navigate back to the first three contests and verify that they are being displayed with the larger font size and that the original ballot choices were preserved. F => If the first three contests are not shown in the larger font size, then the system fails. F => If the original ballot choices were not preserved, then the system fails. After voting for Registrar of Deeds (Contest #6), the tester shall then re-select a font size between 3.0mm and 4.0mm by means of the universal reset mechanism specified in 3.2.5-C XREF. The tester shall vote through contest #9, and repeat the above process, verifying the text is of the appropriate size and that earlier ballot choices have been preserved. F => If the font size cannot be changed at this point, then the system fails. F => If the original font size is not available for selection, then the system fails. F => If any of the capital letters intended for the voter has a height less than 3.0mm or greater than 4.0mm, then the system fails. F => If the original ballot choices were not preserved, then the system fails. Following contest #9, if there are available font sizes between the two guaranteed by the VVSG, the tester shall select one of these and verify that the display agrees with the selected size. This intermediate size is optional, and so this part of the test is enacted only if such a size is available. F => If the ballot pages are not consistently displayed in the intermediate font size, then the system fails. F => If the original ballot choices were not preserved, then the system fails. 3.2.5-F Use of Sans Serif Font Test Method: Font Characteristics 3.2.5-G Legibility of Paper Ballots and Verification Records Test Method: Note that this requirement and its sub-requirements apply to all the various types of paper records that would normally be available to the voter. This includes the ballot itself, as well as a verification record, as in the case of VVPAT systems. If the system attempts to achieve legibility via font size or magnification, use the test method for the corresponding sub-requirement below. If the system uses some other means to achieve legibility, then a tester with expertise in visual usability shall proceed through an entire voting session, fill out the ballot using the default ballot choices, examine the paper records used by the system, and determine whether the system incorporates features such that a voter with poor reading vision (20/70 farsighted vision) would be able to vote successfully. 20/70 farsighted is defined as the ability to read characters subtending an arc of 17.5 minutes at a distance of 40 cm. Such characters have a height of at least 2mm. PF => If a poor-vision voter would be able to read the paper records successfully, then the system passes, otherwise it fails. 3.2.5-G.1 Legibility via Font Size Test Method: This test applies if the system has chosen to meet the legibility requirement via font size. The tester shall proceed through an entire voting session, fill out the ballot using the default ballot choices so as to cause the system to present all the various types of paper records that would normally be available to the voter. The tester shall select a font size between 3.0 and 4.0 mm for paper records. F => If no such font size is available for selection, then the system fails. On each page of all the paper records, the tester shall measure (using a 15x magnifier) the height of capital letters in the smallest text intended for the voter. F => If any of these letters has a height less than 3.0mm or greater than 4.0mm, then the system fails. The tester shall then repeat the above process, except for selecting a font size between 6.3mm and 9.0mm. This may require a separate voting session, as there is no requirement to allow voters to switch font size for paper within a session. F => If no such font size is available for selection, then the system fails. On each page of all the paper records, the tester shall measure (using a 15x magnifier) the height of capital letters in the smallest text intended for the voter. F => If any of these letters has a height less than 6.3mm or greater than 9.0mm, then the system fails. 3.2.5-G.2 Legibility via Magnification Test Method: This test applies if the system has chosen to meet the legibility requirement via magnification. The tester shall proceed through an entire voting session, fill out the ballot using the default ballot choices, and, as instructed by the system, use the magnification mechanism to view all the paper records. F => If there are paper records presented to the voter for which the magnifier is not available, then the system fails. The tester shall view the paper records as magnified, and determine whether the records would be readily legible to a voter with 20/70 farsighted vision. 20/70 farsighted is defined as the ability to read characters subtending an arc of 17.5 minutes at a distance of 40 cm. Such characters have a height of at least 2mm. PF => If a voter with 20/70 vision would be able to read the paper records successfully, then the system passes, otherwise it fails. 3.2.5-H Contrast Ratio Test Method: First, the tester must select samples for contrast testing. It is impractical to measure contrast ratios for all of the visual material intended for voters and poll workers. Material intended for voters includes: Instructions (built-in or external) on the use of the system for voting The actual ballot or ballot interface Verfication records Material intended for poll workers includes: Instructions on the operation of the system Any labels or instructions affixed to the system itself The tester should select at least one example from each available type of material. Since the purpose of the test is to assure adequate contrast, the tester should look for examples of potentially low contrast, such as light-colored icons or text on a white background, or dark icons or text on a deeply colored background. It is very difficult, with current technology, to measure the luminance, and hence contrast, of small areas (1-2 pixels wide). Therefore, in the examples chosen for inspection, both the lighter and darker area must be at least 1/2 inch in height and width. Note that the content may be presented on an electronic screen or on a "passive" medium that is to be viewed via ambient light. After selecting examples to be measured, the tester shall use the appropriate procedure described below, depending on the medium of presentation. If the medium is passive (such as paper or plastic labels), the tester shall measure the luminance of the foreground item and of the adjacent background, using a spot photometer. The sensitivity of the photometer shall be set so as to simulate an environment with a diffuse ambient light level of 500 lx. F => If the higher luminance of these two measurements is less than three times the lower luminance, then the system fails. If the medium is an electronic screen, the procedure for measuring the ambient contrast ratio is described in Section 308-2 of the VESA Flat Panel Display Measurements standard (FPDM) Version 2. The referenced standard specifies the required test equipment, test setup, and test procedures. The diffuse ambient light level for this test shall be 500 lx. F => If the measured contrast ratio is less than 3:1, then the system fails. 3.2.5-I High Contrast for Electronic Displays Test Method: The tester shall proceed through a voting session using the default ballot choices. Within the session, the tester shall select the high contrast option, either explicitly or by default. Next, the tester must select samples for contrast testing. It is impractical to measure contrast ratios for all of the screens intended for voters. Since the purpose of the test is to assure adequate contrast, the tester should look for examples of potentially low contrast, such as light-colored icons or text on a white background, or dark icons or text on a deeply colored background. The procedure for measuring the ambient contrast ratio is described in Section 308-2 of the VESA Flat Panel Display Measurements standard (FPDM) Version 2. The referenced standard specifies the required test equipment, test setup, and test procedures. The diffuse ambient light level for this test shall be 500 lx. PF => If the measured contrast ratio is at least 6:1, then the system passes, otherwise it fails. 3.2.5-J Accommodation for Color Blindness Test Method: Use of Color 3.2.5-K No Reliance Solely on Color Test Method: Use of Color 3.2.6 Interaction Issues 3.2.6-A No Page Scrolling Test Method: Scrolling and Feedback 3.2.6-B Unambiguous Feedback for Voter's Selection Test Method: Scrolling and Feedback 3.2.6-C Accidental Activation Test Method: Accidental Activation 3.2.6-C.1 Size and Separation of Touch Areas Test Method: Accidental Activation 3.2.6-C.2 No Repeating Keys Test Method: Accidental Activation 3.2.6.1 Timing Issues 3.2.6.1-A Maximum Initial System Response Time Test Method: Response Time 3.2.6.1-B Maximum Completed System Response Time for Vote Confirmation Test Method: Response Time 3.2.6.1-C Maximum Completed System Response Time for All Operations Test Method: Response Time 3.2.6.1-D System Response Indicator Test Method: Response Time 3.2.6.1-E Voter Inactivity Time Test Method: Inactivity Time 3.2.6.1-F Alert Time Test Method: Inactivity Time 3.2.7 Alternative Languages 3.2.7-A General Support for Alternative Languages Test Method: Alternative Languages 3.2.7-A.1 Voter Control of Language Test Method: Alternative Languages 3.2.7-A.2 Complete Information in Alternative Language Test Method: Alternative Languages 3.2.7-A.3 Auditability of Records for English Readers Test Method: Alternative Languages 3.2.7-A.4 Usability Testing by Manufacturer for Alternative Languages Test Method: Usability Testing by Manufacturer 3.2.8 Usability for Poll Workers 3.2.8-A Clarity of System Messages for Poll Workers Test Method: Operational Usability for Poll Workers (PWU) 3.2.8.1 Operation 3.2.8.1-A Ease of Normal Operation Test Method: Operational Usability for Poll Workers (PWU) 3.2.8.1-B Usability Testing by Manufacturer for Poll Workers Test Method: Usability Testing by Manufacturer 3.2.8.1-C Documentation usability Test Method: Operational Usability for Poll Workers (PWU) 3.2.8.1-C.1 Poll Workers as target audience Test Method: Operational Usability for Poll Workers (PWU) 3.2.8.1-C.2 Usability at the polling place Test Method: Operational Usability for Poll Workers (PWU) 3.2.8.1-C.3 Enabling verification of correct operation Test Method: Operational Usability for Poll Workers (PWU) 3.2.8.2 Safety 3.2.8.2-A Safety Certification Test Method: The tester shall verify that the system has been certified in accordance with the requirements of UL 60950, Safety of Information Technology Equipment, by a duly authorized safety testing laboratory. FP => If such certification cannot be verified, then the system fails, otherwise it passes. Note that the tester is not expected to perform the safety checks directly, but rather to verify that the system has been certified by a safety lab. 3.3 Accessibility Requirements 3.3.1 General 3.3.1-A Accessibility throughout the Voting Session Test Method: End-to-end Accessibility (E2E-Acc) 3.3.1-A.1 Documentation of Accessibility Procedures Test Method: End-to-end Accessibility (E2E-Acc) 3.3.1-B Complete Information in Alternative Formats Test Method: This general requirement is tested specifically under sec. 3.3.3-B XREF, "Audio-Tactile Interface". 3.3.1-C No Dependence on Personal Assistive Technology Test Method: End-to-end Accessibility (E2E-Acc) 3.3.1-D Secondary Means of Voter Identification Test Method: The tester shall first determine whether the system uses biometric characteristics for voter identification or authentication, such as an electronic poll book that uses fingerprints. P => If biometric measures are not used for voter identification, then the system passes. If biometric measures are used, the tester shall review the documentation of the system to verify that an alternative means is available (such as presentation of identity documentation or another biometric mode). PF => If an alternative means of identification is available, then the system passes, otherwise it fails. 3.3.1-E Accessibility of Paper-based Vote Verification Test Method: Accessible Ballot Verification and Submission 3.3.1-E.1 Audio Readback for paper-based Vote Verification Test Method: Accessible Ballot Verification and Submission 3.3.2 Low Vision 3.3.2-A Usability Testing by Manufacturer for Voters with Low Vision Test Method: Usability Testing by Manufacturer 3.3.2-B Adjustable Saturation for Color Displays Test Method: Partial Vision 3.3.2-C Distinctive Buttons and Controls Test Method: Partial Vision 3.3.2-D Synchronized Audio and Video Test Method: The tester shall proceed through the voting session using the default ballot choices, except as noted below. The tester shall first select video-only mode (no audio), vote in contests #1 and #2 and proceed to contest #3. In that contest he/she shall vote for a write-in candidate, "Vicki Video". F => If video-only mode is unavailable, then the system fails. F => If video-only mode does not present the ballot visually, while suppressing audio output, then the system fails. The tester shall then switch to audio-only mode (no video) and proceed through contests #4 and #5 to contest #6 in which he/she shall vote for a write-in candidate, "Andy Audio". F => If audio-only mode is unavailable, then the system fails. F => If audio-only mode does not present the ballot aurally, while suppressing visual output, then the system fails. The tester shall then switch to full audio-visual mode, and verify that the ballot choices for the first six contests have been preserved. F => If audio-visual mode is unavailable, then the system fails. F => If audio-visual mode does not present the ballot both visually and aurally, then the system fails. F => If switching among modes has caused the ballot choices to be lost or altered, then the system fails. The tester shall then fill out the remainder of the ballot. Throughout the session, there must be a reasonable correspondence between the visual and auditory presentation of the ballot. In particular, when there is a detectable voter action (such as selecting a candidate, advancing to the next page, or typing in a write-in choice) both visual and auditory presentations must respond accordingly. The tester must allow for the fact that a large amount of visual information can be presented "all at once" on a page, whereas auditory information is necessarily presented in a temporal sequence. FP => If there is a significant lack of correspondence between the visual and auditory information presented, then the system fails, otherwise it passes. 3.3.3 Blindness 3.3.3-A Usability Testing by Manufacturer for Blind Voters Test Method: Usability Testing by Manufacturer 3.3.3-B Audio-Tactile Interface Test Method: Audio-Tactile Interface 3.3.3-B.1 Equivalent Functionality of ATI Test Method: Audio-Tactile Interface 3.3.3-B.2 ATI Supports Repetition Test Method: Audio-Tactile Interface 3.3.3-B.3 ATI Supports Pause and Resume Test Method: Audio-Tactile Interface 3.3.3-B.4 ATI Supports Transition to Next or Previous Contest Test Method: Audio-Tactile Interface 3.3.3-B.5 ATI Can Skip Referendum Wording Test Method: Audio-Tactile Interface 3.3.3-C Audio Features and Characteristics Test Method: This general requirement is tested specifically under its sub-requirements. 3.3.3-C.1 Standard Connector Test Method: The tester shall connect a headphone that has a 3.5mm stereo plug and verify that the audio presentation of the ballot is clearly audible through the headphones. FP => If no such jack is available or if there is not a clear audio signal through the headphones, then the system fails, otherwise it passes. 3.3.3-C.2 T-coil Coupling Test Method: The test methods to be used are fully documented in the ANSI standard as cited in the requirement. F => If a wireless T-Coil coupling is not provided, then the system fails. PF => If the wireless T-Coil coupling meets the test criteria for category T4, then the system passes, otherwise it fails. 3.3.3-C.3 Sanitized Headphone or Handset Test Method: The tester shall inspect the method used by the system to provide a headphone or handset to the voter. Sanitation can be achieved in various ways, including the use of "throwaway" headphones, or of sanitary coverings. F => If no audio device is provided, then the system fails. PF => If there are adequate provisions for sanitization, then the system passes, otherwise it fails. 3.3.3-C.4 Initial Volume Test Method: Audio Volume 3.3.3-C.5 Range of Volume Test Method: Audio Volume 3.3.3-C.6 Range of Frequency Test Method: For this test, the tester needs to control the input signal to the audio equipment, rather than using the normal audio signal generated by the test ballot. Frequency range is measured in one of two ways, depending on whether the audio information is presented through open air or through headphones or a handset. For both modes: The input test signal shall be a pink noise with a flatness of at least ± 0.5 dB for all third octave bands from 100Hz to 10KHz. The output level should be 80 dB SPL ± 5 dB as measured with a broadband instrument using an A-weighting filter. This output level should ensure that the audio circuit’s peak capacity isn’t reached and therefore won’t influence the frequencies of interest. The frequency spectrum shall be measured as described in IEEE 269 for all third octave bands from 100Hz through 10KHz and the measured spectrum shall comply within the tolerances of the floating mask requirement. If the audio output falls below XXdB SPL for any frequency between 315Hz and 10KHz, "Range of Frequency" fails. Measurement of Open Air Frequency The test for measuring the audio frequency spectrum is described in IEEE 269 . Open air sound levels should be measured in anechoic conditions to prevent reflections from affecting the measurement accuracy. If the voting system is designed for operation when both sitting and standing, then measurements shall be taken for both operating positions and for the 5th percentile female and 95th percentile male. Measurement of Headphone/Handset Frequency The test is described in IEEE 269. The referenced standard specifies the required test equipment, test setup, and test procedures. Follow the test methodology that is relevant to receiving audio through a private audio output device applicable to the voting system under test. "Headphones" equates to the term "headsets" used in Clause 9 of the referenced standard. For a HATS (Head and Torso Simulator), Type 3.3 ears shall be used as defined in Clause 5 of the referenced standard. If the ERP (Ear Reference Point) is not specified by the manufacturer of the private audio output device, then the defaults in the referenced standard shall be used. 3.3.3-C.7 Intelligible Audio Test Method: The tester shall proceed through the entire voting session using the default ballot choices, and evaluate the intelligibility of the audio information presented, including the pronunciation of candidate names, instructions, and warnings, the use of normal intonation, appropriate rate of speech, and acceptably low background noise. FP => If the tester judges that significant information would be unintelligible to the voter, then the system fails, otherwise it passes. The loss of small amounts of non-critical information should be noted, but is not by itself a basis for failure. 3.3.3-C.8 Control of Speed Test Method: The tester shall proceed through the voting session to contest #5 (Lt-Governor) and measure the amount of time taken to announce all the candidates, using the default speech rate. The tester shall set the speech rate to its minimum. F => If there is no mechanism for adjusting speech rate, then the system fails. The tester shall then skip back to the beginning of contest #5 and again measure the time taken to announce all the candidates. F => If this second time is less than 4/3 of the original time, then the system fails. The tester shall then set the speech rate to its maximum and then skip back to the beginning of contest #5 and again measure the time taken to announce all the candidates. F => If this third time is greater than 1/2 of the original time, then the system fails. 3.3.3-D Ballot Activation Test Method: The tester shall determine if the system supports ballot activation for non-blind voters. P => If ballot activation by the voter is not supported, then the system passes. If the voting station does support ballot activation for non-blind voters, the tester shall proceed through the process of ballot activation, using the features provided for blind voters and verify that these features constitute a viable mechanism for such voters. FP => If blind voters would encounter significant difficulty in activating the ballot, then the system fails, otherwise it passes. 3.3.3-E Ballot Submission and Vote Verification Test Method: Accessible Ballot Verification and Submission 3.3.3-F Tactile Discernability of Controls Test Method: The tester shall proceed through the voting session using the default ballot choices. During the session, the tester shall examine all of the system's buttons, controls, and keys intended for use by the voter and verify that they are distinguishable by shape or texture. Note that not every individual key within a keypad or keyboard need by distinguishable by touch alone (see requirement 3.3.2-C XREF); it is sufficient if certain "home" keys (such as the "5" in the middle of the keypad) are tactilely distinctive. This allows the user to navigate to nearby keys via their position. F => If there are controls or keys which cannot be distinguished by shape or texture, then the system fails. The tester shall further verify that the controls and keys are sufficiently insensitive that one can easily touch them lightly so as to determine the distinguishing characteristic and yet not activate the control. FP => If the tester has significant difficulty in tactilely discerning a key or control without also activating it, then the system fails, otherwise it passes. 3.3.3-G Discernability of Key Status Test Method: The tester shall proceed through the voting session using the default ballot choices. During the session, the tester shall examine the system for the presence of locking or toggle controls or keys intended for use by the voter. These are keys (such as "caps lock") that modify the effect of all subsequent input until explicitly reversed. P => If there are no such controls or keys, then the system passes. If there are such controls or keys, the tester shall verify that the status of each is visually discernible. For instance, on many keyboards, there is a small LED, either directly on the "caps lock" key or elsewhere on the keyboard, that is lit if and only if "caps lock" is activated. F => If the status of any such control or key is not visually discernible, then the system fails. The tester shall then activate the locking or toggle function of each such key and verify that the state of the key is discernible either through touch (e.g. a key in a depressed or raised position, or a toggle switch positioned to the left or right) or through some audible feedback (e.g. verbal feedback such as "shift"/"unshift" or via a distinctive tone). FP => If the state of the key is discernible through neither touch nor sound, then the system fails, otherwise it passes. 3.3.4 Dexterity 3.3.4-A Usability Testing by Manufacturer for Voters with Dexterity Disabilities Test Method: Usability Testing by Manufacturer 3.3.4-B Support for Non-Manual Input Test Method: Non-Manual Operation 3.3.4-C Ballot Submission and Vote Verification Test Method: Non-Manual Operation 3.3.4-D Manipulability of Controls Test Method: The tester shall proceed through an entire voting session, using the conventional manual controls. The tester need not complete a vote for every contest, but must at least proceed through ballot initiation, vote at least one conventional contest, vote for at least one write-in candidate, and perform final vote verification and casting. F => If any operation in the session requires tight grasping, pinching, or twisting of the wrist, then the system fails. The tester shall also measure the activation force required by the controls. The test for measuring force is to use a linear force gauge with a peak indicator (manual or electronic) on the actual controls. This tool can be used for measuring push and/or pull forces. The force gauge shall have an accuracy of at least ± 0.1N (0.02 lbs) and range from zero to at least 27 N (6 lbs). FP => If the activation force for any control exceeds 5 lbs, then the system fails, otherwise it passes. 3.3.4-E No Dependence on Direct Bodily Contact Test Method: The tester shall proceed through an entire voting session, using the conventional manual controls. The tester need not complete a vote for every contest, but must at least proceed through ballot initiation, vote at least one conventional contest, vote for at least one write-in candidate, and perform final vote verification and casting. Throughout the session, the tester shall avoid direct bodily contact with the system. This can be done by use of a non-conductive probe and/or non-conductive gloves when manipulating the controls. PF => If all of the controls respond properly, then the system passes, otherwise it fails. 3.3.5 Mobility 3.3.5-A Clear Floor Space Test Method: As always, the tester shall ensure that the system has been set up according to the documentation supplied by the manufacturer. The tester shall then measure (using a conventional tape measure) the floor area intended for occupation by the voter. The area to be measured must be clear of all obstructions and overhanging elements. F => If the measured area cannot contain a 30x48 inch rectangle, then the system fails. The tester shall then determine whether the floor area is an integral part of the voting system, or whether this area is assumed by the system to be supplied as part of the polling place infrastructure. If the area is integral to the system, the tester shall measure the slope as follows. Use a 24 inch level and a block of material exactly half an inch thick. Place the level on the floor and rotate it around the center of the area to determine the direction of slope if any. If there is a significant slope, place the block at the lowest point approximately 12 inches from the center of the area. Then place one end of the level on the block and the other end across the center from it (so that the level is along the diameter of a centered circle). F => If the level is sloped downwards towards the block, then the system fails. If the floor area is assumed by the system to be supplied as part of the polling place infrastructure, the tester shall examine the installation documentation to verify that it calls for a slope of no greater than 1:48. F => If the installation documentation does not specify a floor area slope of 1:48 maximum, then the system fails. 3.3.5-B Allowance for Assistant Test Method: As always, the tester shall ensure that the system has been set up according to the documentation supplied by the manufacturer and then approach the station in a wheelchair, oriented and located as intended by the manufacturer. An assistant shall attempt to accompany the tester in the voting area. Note that this area may be open or may comprise the inside of a shielded voting booth. The assistant may stand or be seated, as appropriate for the system. F => If there is inadequate room for the assistant to enter or leave the area, then the system fails. F => If there is inadequate room for the assistant to accompany the tester, then the system fails. They then proceed through the entire voting session (including ballot initiation, verification, and submission, as appropriate) using the default ballot choices. Through contest #9, the voting is accomplished by having the tester view the ballot and then give oral instructions to the assistant who carries them out (as if the tester were sighted but had dexterity disabilities). In the remaining contests, the voting is accomplished by having the assistant view the ballot and then give oral instructions to the tester who carries them out (as if the tester had vision disabilities, but not dexterity disabilities). F => If either the tester or assistant has significant difficulty viewing the ballot or other relevant material, then the system fails. The tester shall assess, based on the voting session, whether or not there are significant difficulties in executing the ballot (e.g. because needed controls are hard to reach or to manipulate). Execution includes ballot initiation, verification, and submission, as well as selection of candidates. F => If either the tester or assistant has significant difficulty executing the ballot, then the system fails. 3.3.5-C Visibility of Displays and Controls Test Method: The tester shall deploy the voting station according to the instructions of the manufacturer (including lighting) and approach the station in a wheelchair, oriented and located as intended by the manufacturer. The tester shall have vision no worse than 20/40 corrected. The tester shall determine if there is significant difficulty in seeing any of the controls, keys, or audio jacks, or in reading any of the labels, displays, or other elements of the voting station intended for the voter. Potential problems include inadequate font size or excessive glare. FP => If there is significant difficulty in the visibility of elements intended for the voter, then the system fails, otherwise it passes. 3.3.5.1 Controls within Reach 3.3.5.1-A Forward Approach, No Obstruction Test Method: The tester shall measure the high and low reach points of the voting station using a conventional tape measure. See Figure 3-1 for guidance. PF => If both reach points meet the specifications, then the system passes, otherwise it fails. 3.3.5.1-B Forward Approach, with Obstruction Test Method: This general requirement is tested specifically under its sub-requirements. 3.3.5.1-B.1 Maximum Size of Obstruction Test Method: The tester shall measure the depth and height of the forward obstruction of the voting station using a conventional tape measure. See Figure 3-2 for guidance. PF => If the depth and height meet the specifications, then the system passes, otherwise it fails. 3.3.5.1-B.2 Maximum High Reach over Obstruction Test Method: The tester shall measure the high reach point of the voting station using a conventional tape measure. See Figure 3-2 for guidance. PF => If the high reach point meets the specifications (based on obstruction depth, measured previously), then the system passes, otherwise it fails. 3.3.5.1-B.3 Toe Clearance under Obstruction Test Method: The tester shall measure the toe clearance depth and width of the voting station using a conventional tape measure. See Figure 3-2 for guidance. PF => If the toe clearance depth and width meet the specifications, then the system passes, otherwise it fails. 3.3.5.1-B.4 Knee Clearance under Obstruction Test Method: The tester shall measure the knee clearance depth and width of the voting station using a conventional tape measure. See Figure 3-2 for guidance. The depth shall be measured at heights of 9, 18, and 27 inches. PF => If all of the knee clearance measurements meet the specifications, then the system passes, otherwise it fails. 3.3.5.1-C Parallel Approach, No Obstruction Test Method: The tester shall measure the high and low reach points of the voting station using a conventional tape measure. See Figure 3-3 for guidance. PF => If both reach points meet the specifications, then the system passes, otherwise it fails. 3.3.5.1-D Parallel Approach, with Obstruction Test Method: This general requirement is tested specifically under its sub-requirements. 3.3.5.1-D.1 Maximum Size of Obstruction Test Method: The tester shall measure the depth and height of the side obstruction of the voting station using a conventional tape measure. See Figure 3-4 for guidance. PF => If the depth and height meet the specifications, then the system passes, otherwise it fails. 3.3.5.1-D.2 Maximum High Reach over Obstruction Test Method: The tester shall measure the high reach point of the voting station using a conventional tape measure. See Figure 3-4 for guidance. PF => If the reach point meets the specifications (based on obstruction depth, measured previously), then the system passes, otherwise it fails. 3.3.6 Hearing 3.3.6-A Reference to Audio Requirements Test Method: See tests for 3.3.3-C XREF "Audio Features and Characteristics". 3.3.6-B Visual Redundancy for Sound Cues Test Method: The tester shall proceed through an entire voting session, using the editable ballot session. The voting station shall be in full synchronized audio/visual mode. While voting for contest #2, the tester shall refrain from activity so as to cause the system to issue an inactivity alert (see requirement 3.2.6.1-E XREF). If at any time, an aural cue is used as a warning or alert (e.g. for inactivity or for attempted overvoting), there must also be a corresponding visual cue. PF => If all aural cues are accompanied by visual cues, then the system passes, otherwise it fails. 3.3.6-C No Electromagnetic Interference with Hearing Devices Test Method: The test methods to be used are fully documented in the ANSI standard as cited in the requirement. PF => If the system meets the test criteria for category T4, then the system passes, otherwise it fails. 3.3.7 Cognition 3.3.7-A General Support for Cognitive Disabilities Test Method: The features mentioned in the Discussion entry are tested as described in the cited sections. 3.3.8 English Proficiency 3.3.8-A Use of ATI Test Method: See tests for 3.3.3-B XREF "Audio-Tactile Interface". 3.3.9 Speech 3.3.9-A Speech not to be Required by Equipment Test Method: The tester shall proceed through an entire voting session. The tester need not complete a vote for every contest, but must at least proceed through ballot initiation, vote at least one conventional contest, and vote for at least one write-in candidate, and perform final vote casting. The tester shall verify that speech is never required to perform any of the functions of the system. FP => If speech is required to perform any voting function, then the system fails, otherwise it passes. Part 2: Combined-Requirement Test Methods (CRTMs) in Support of Usability and Accessibility Test Methods Table of Contents: Voting Performance Protocol (VPP) Usability Testing by Manufacturer Editable Ballot Session Non-Editable Ballot Session Privacy of Voting Session Privacy of Cast Vote Record (CVR) Language Clarity Ballot Design Default Characteristics Font Characteristics Use of Color Scrolling and Feedback Accidental Activation Response Time Inactivity Time Alternative Languages Operational Usability for Poll Workers (PWU) End-to-end Accessibility (E2E-Acc) Accessible Ballot Verification and Submission Partial Vision Audio-Tactile Interface Audio Volume Non-Manual Operation Test Method: Voting Performance Protocol (VPP) Covers requirements: 3.2.1.1-A Total Completion Performance 3.2.1.1-B Perfect Ballot Performance 3.2.1.1-C Voter Inclusion Performance 3.2.1.1-D Usability metrics from the Voting Performance Protocol 3.2.1.1-D.1 Effectiveness metrics for usability 3.2.1.1-D.2 Voting session time 3.2.1.1-D.3 Average voter confidence This section describes the full Voter Performance Protocol (VPP) for the VVSG tests (those addressing section 3.2.1.1 XREF). A white paper by NIST on Usability Performance Benchmarks for the VVSG discusses the rationale behind many of the design decisions for the VPP. The VPP is by far the most complex test within the Usability and Accessibility section. The general idea is to run a "mock" election under controlled conditions, and then derive metrics for the effectiveness, efficiency and satisfaction exhibited. VPP Overview 1. Acronyms The following acronyms are used throughout the VPP: CI Confidence interval - a statistical construct, expressing the degree of confidence in a specified accuracy of the result. MW Mann-Whitney test - used to detect whether there is a statistically significant difference between the current and nominal distribution of raw scores for the calibration system. NIB Number of invalid ballots - determined by the number of participants who responded on the post-test questionnaire that they did NOT try to follow instructions. In addition, a standardized method for determining participants who did not really attempt to follow the directions is used to eliminate their data from the analysis. PBI Perfect Ballot Index - effectiveness metric, as tested here. TCS Total Completion Score - effectiveness metric, as tested here. VII Voter Inclusion Index - effectiveness metric, as tested here. VPP Voter Performance Protocol - test method for usability performance VSUT Voting System Under Test - the system for which conformance to the VVSG is being evaluated. VVSG Voluntary Voting System Guidelines - the set of requirements against which one tests conformance by a VSUT. VPP Overview 2. Test Method Calibration System (TMCS) In any test of this type there exists a possibility that a particular run of the test is invalid. This could be due to a problem with, for example, its preparation (including participant recruitment), administration, or results analysis. In order to guard against such measurement errors, two systems are tested in parallel: the actual voting system under test (VSUT), and a so-called test method calibration system (henceforth referred to simply as the calibration system). The nominal (i.e. expected) effectiveness results for the calibration system have been previously established. Therefore, if the current results from the calibration system match these nominal results closely enough, then the testing is presumed to be valid; otherwise the test must be repeated, after the testing problems have been identified and corrected. As an analogy, one might use a standard kilogram artifact to ensure that an instrument for measuring mass is operating correctly. Note that the purpose of calibration is to ensure consistency among tests. It is not assumed that, within a single test, the calibration system and the VSUT are similar, either in their architecture or their effectiveness. Further details on calibration may be found within the test description below. As of September 2008, there is no officially designated calibration system. However, a proof-of-concept calibration system has been repeatedly measured so as to establish its effectiveness characteristics for the purpose of calibration. In order to illustrate how the testing will work when such results are known, this test method refers throughout to a fictitious calibration system, named "System X". In the example data and calculations for System X, the variable X_nominal refers to System X's previous (a.k.a., "nominal") results and X_current refers to its current results. System X (fictitious) Note: There may be more than one TMCS in existence at any point in time. However, we will only reference one in this document for the sake of the example calcuations provided. VPP Overview 3. Statistical Techniques and Software Support The references in this section discuss the VPP's various statistical techniques. The Perl code is included not only to allow computation but also to provide a detailed description of the algorithms being used. Many of the complex VPP procedures described below are supported by the benchmark calculation Perl scripts. Disclaimer: This protocol is in draft form. Thus, some of the techniques and associated scripts may be subject to change. Mann-Whitney: Mann-Whitney U (Wikipedia) article Perl source code. link to Excel spread sheet (TBD) Adjusted Wald Method: Estimating Completion Rates from Small Samples Using Binomial Confidence Intervals: Comparisons and Recommendations On-line calculator Perl source code. link to Excel spread sheet (TBD) Capability index: What is Process Capability? Perl source code link to Excel spread sheet (TBD) VPP Overview 4. Key Variables As we proceed through the description, we shall refer to certain named quantities that must be observed or computed. These quantities are per-system, not for the two systems (VSUT and calibration) combined. Here is a summary: Name Meaning Value Range NPART Number of participants who attempt to vote on the system At least 100 NCAST Number of participants who successfully cast a ballot on the system 100 - NPART NPERFECT Number of participants who successfully cast a perfectly correct ballot on the system. 0 - NCAST NCORRECT-i Number of voting opportunities successfully taken by i-th participant 0-28 PCORRECT-i Proportion of voting opportunities successfully taken by i-th participant 0-1.00 TASKTIME-i Number of seconds taken by the i-th participant to complete the voting task. Typical value in the hundreds VPP Overview 5. Role of Manufacturer The system manufacturer may or may not observe the test, according to the practices of the test lab. However, no manufacturer representative may have any contact with participants before, during, or after the test. VPP Overview 6. Protocol Steps Here are the major steps of the Voting Performance Protocol: Recruit and schedule participants Set up environment Set up voting systems Prepare participants Conduct the voting Debrief participants Data collection Check calibration results Analyze data Report system results VPP Step 1. Recruit and schedule participants The test lab must be sure that it has met all Federal and state legal requirements for the use of human subjects. For a valid test, there must be at least 100 participants who succeed in casting the ballot for each of the two systems. The test lab will typically have to "over-recruit" to allow for subjects who do not show up, who are ineligible for various reasons, who fail to cast a ballot, or who do not follow instructions. This is a between-subjects test - each participant uses either the VSUT or calibration system, not both. The participant population is limited to individuals who: are US citizens eligible to vote are proficient in English do not consider themselves as a person with a disability have not previously participated as a participant in a test using this voter performance protocol are not a poll worker and do not work in any other part of the voting process have no significant connection to any manufacturer of voting systems - e.g. no close relative as an employee or owner Both pools of participants should be balanced according to certain demographic criteria, with the target distribution as follows: Gender -- female: 55%; male: 45%; Race -- African-American: 10%; Non-Hispanic White: 80%; Hispanic: 10%; Education -- Some College: 20%; College Graduate: 50%; Post-Graduate degree: 30%; Age -- 25-34 yrs: 30%; 35-44 yrs: 35%; 45-54 yrs: 35%; Whoever performs the recruiting (either the test lab itself, or a recruitment company) should try to achieve the target percentages presented above. However, even if the actual test population varies from these targets, the test is still to be considered valid as long as the results from the calibration system are satisfactory. Here is an example of a screening questionnaire that may be useful for recruitment. The participants should be scheduled for staggered arrival at the testing site, so as to avoid excessive waiting. Since the voting session itself can easily take 10 minutes, it would be reasonable to separate participant arrivals by about 15 minutes, but the optimum interval depends strongly on the system being tested. See the page on "Performance Timing" in the benchmark data gathered earlier to get a sense of the range of voting times. VPP Step 2. Set up environment The goal, as far as possible, is to simulate an optimal polling place. Thus, any errors detected will not be traceable to extraneous environmental factors. There must be sufficient room in which to carry out the mock voting, using at least two voting stations. The voting area should have the following characteristics: Size: minimum 12' by 15' by 8' high. Ambient lighting should be in the range of 400-600 lx. If possible, use indirect lighting rather than overhead fixtures or direct sunlight so as to reduce glare. Ambient noise levels should be below 40dB Ventilation should be such as to avoid either a "stuffy" or "drafty" feeling. Temperature should be between 68 and 76 Fahrenheit Relative Humidity should be between 20% and 60% There should be sufficient separation between participants to maintain their privacy while they perform the tasks using different systems See this OSHA guideline for more detailed recommendations. This University of Wisconsin webpage is also useful. VPP Step 3. Set up voting systems The VPP, as well as many other usability and accessibility tests, requires the manufacturer to set up the voting system with a ballot based on the NIST standard ballot specification. The manufacturer is responsible for the actual ballot design (fonts, layout, etc). Once this ballot has been loaded on both the systems (the VSUT and calibration system), the test lab must set these up as described in their documentation and prepare them to receive votes. VPP Step 4. Prepare participants The test facilitators are responsible for preparing the participants for the test procedure. Follow these steps for each participant: Greet incoming participants and verify that they are here for the appropriate purpose. Administer an informed consent and release form, as appropriate. Here is an example of the form NIST used during development of the test, but this should be customized to suit the test lab. Usually, you would witness the participant's signature and then sign the form yourself as a witness. Hand each participant a copy of the overview of the voting instructions and have him or her read it over. Do not coach the participant on strategies for voting or on how to use the voting system. The goal is to minimize, if not eliminate entirely, any "facilitator effect". Finally, escort the participant to the system (either the VSUT or the calibration system), give them the complete copy of the voting instructions (including the specific names to vote for, etc.), and, if appropriate, give them any artifacts that they need to activate the system (access card, numeric code, etc) to begin voting. VPP Step 5. Conduct the voting The two systems (the VSUT and calibration system) are to be tested in parallel. The test participants are not informed which system is being considered for certification. Depending on the type of system, the facilitator may be required to start the voting process. The facilitator should refrain from otherwise interacting with the participant during their voting session so as to adhere to the testing protocol. Any interaction between the participant and the facilitator once testing has started would invalidate the result from that participant. It is acknowledged that this is not the usual practice in real elections, where poll workers are available to assist voters if they have problems. However, this limitation is necessary to ensure valid and reliable data, since we need to eliminate the helpfulness of facilitators as a factor. If the facilitator is asked for assistance, this should be the reply: "I'm sorry. I can't provide you with any help. Please do the best you can. If you are stuck and cannot continue, you can stop." For each system, there should be two "observers". They are responsible for determining two items of data via observation of each participant: 1) whether or not the participant successfully casts a ballot (regardless of whether the ballot choices are correct) and 2) the time taken to vote. VPP Step 5.1 Successful Casting If the participant fails to successfully cast a ballot, this should be noted, as this class of error will be counted separately from accuracy, timing, and satisfaction data. The participant who fails to cast cannot be counted as part of NCAST. Examples of such failure include simply abandoning the system, or leaving the system under the mistaken impression that the ballot has been cast. If the observer needs to cast the ballot in order to clear the system for the next participant, this should be done as soon as it is obvious that the participant has abandoned the session, but note that this ballot should not be counted when computing accuracy metrics (perfect ballot index and voter inclusion index). Although most systems have a way to discard a non-cast ballot, if this is not possible on a specific system, the observer could mark the ballot by writing in a vote for "DONOTUSE" as Governor, so that the ballot can be identified later on. Non-cast ballots are also excluded from timing data. VPP Step 5.2 Collecting timing data during testing The observers are responsible for timing. Timing will begin when the participant reaches the system or starts reading the complete voting instructions. Note that there may be a delay before the participant actually commences the first step (i.e., enters the activation card, receives the paper ballot, etc.); this delay is considered to be part of the session to be timed, since the time taken to understand how to begin using the system is significant. Once the participant has completed the voting task, the observers will record the elapsed time. What constitutes completion of the voting session depends on the type of system. For simple DREs, walking away from the system signals end of session. Other systems may involve the verification of a paper record or submittal of a paper ballot to an optical scan device. Participants should use the system as it is intended to be used in normal practice. The recorded time is TASKTIME-i. VPP Step 5.3 Sufficient number of ballots At least 100 valid ballots are needed for the analysis, but additional recruited participants who show up are allowed to vote so that the demographics of the recruit are maintained as well as possible. Note that the statistical reliability of the test depends upon there being at least 100 validly cast ballots for each system. VPP Step 6. Debrief participants Once the participant has completed the voting session, a worker administers the post-test questionnaire, to be filled out by the participant him/herself. The only questions are (1) if they tried to follow the instructions telling them how to vote, (2) a question on confidence in their performance, and (3) a question on how well they liked the system. VPP Step 6.1 "Invalid Ballots" and the Validity of the VPP If a participant's cast ballot demonstrates clear patterns that show they did not attempt to follow the voting instructions they were given, then that ballot is considered "invalid" as a representative measurement according to this protocol. Disclaimer: This protocol is in draft form. Thus, NIST is working to determine if an objective, repeatable method can be found for removing an invalid ballot. VPP Step 6.2 Stipend Paid to Participants Finally, a worker provides the participant with his/her compensation ($50.00) and thanks the participant for his/her time. VPP Step 7. Data collection For each participant/ballot, the following basic data needs to be collected. Note that it might not be possible to associate each participant with his/her ballot. Which system did he/she exercise (VSUT/calibration) Successful ballot casting? (yes/no) If ballot was cast, number of correct votes (0-28); Time on task, in seconds. Response to post-test question on "effort" (yes/no) Response to post-test question on confidence (1-5) Response to post-test question on likability (1-5) The counting procedure for the number of correct votes is described below. VPP Step 7.1 Scoring Ballots First, you must use the ballot as recorded, not as marked. That is, the result of the test is the internal electronic record of the ballot (whether resulting from a DRE or from optical scanning of paper or the like), not the DRE screen or the paper ballot as such. Note that an overvoted contest on a paper ballot usually results in no votes recorded for that contest. Also, the interpretation of a straight party vote together with votes for individual candidates is not always obvious so the way the system "counts" votes must be understood and confirmed. For each ballot, you must count up the number of voting opportunities correctly executed, with 28 being a perfect score. Seventeen contests (not including the straight party contest) are vote-for-1, accounting for 17 points. In addition, there is a vote-for-5, a vote-for-2, and a vote-for-4 contest, accounting for the other 11 points. Please note that a selection in the straight party contest does not contribute directly to the point total. Rather it is only the effect of that contest in selecting actual candidates (as reflected in the ballot-as-recorded) that is considered. For all write-in choices, if the name is spelled exactly as given in the instructions, it is a correct vote, otherwise not. However, you should ignore extra spaces and any upper/lower case distinction when checking spelling. Most of the contests on the ballot are vote-for-1, and for these the counting is simple: If the participant voted for the correct choice (and no one else), count it as 1, otherwise 0. If the instructions were to not vote (undervote) that contest, then, if the contest was unvoted, count it as 1, otherwise 0. There are three vote-for-N contests. The County Commissioners contest is vote-for-5, so start with a score of 5. For each of the 5 instructed commissioners not voted for, subtract 1. For each un-instructed commissioner who was voted for, subtract 1. If the result is less than 0, count it as 0. Likewise, the Water Commissioners contest is vote-for-2, so start with a score of 2. For each of the 2 instructed commissioners not voted for, subtract 1. For each un-instructed commissioner who was voted for, subtract 1. If the result is less than 0, count it as 0. Finally, the City Council contest is vote-for-4, so start with a score of 4. For each of the 3 instructed commissioners (the instructions deliberately call for undervoting) not voted for, subtract 1. For each un-instructed commissioner who was voted for, subtract 1. If the result is less than 0, count it as 0. Thus, the total raw score for each ballot is a number from 0 to 28 (which we will call NCORRECT-i). From this we immediately derive a scaled score for each ballot: PCORRECT-i = NCORRECT-i / 28 VPP Step 7.2 Software Support The benchmark calculation Perl scripts (as described here), can perform this procedure. VPP Step 7.3 Retention of Records All records pertaining to the test data (whether created by the voting system or by the test facilitator) should be stored safely and privately for future reference. The purpose is twofold: first to protect participant privacy, and second to allow any questions about the test results to be resolved based on direct evidence. VPP Step 8. Check calibration results In order to ensure the validity of the testing procedure, the current results from the calibration system are compared to its nominal results. The nominal results have been previously validated as truly representative of the performance of the calibration system. Therefore, if the current results differ significantly from the nominal results, the entire test is rejected as invalid. In such a case, no conclusions can be drawn about whether the VSUT does or does not meet the requirements in section 3.2.1.1 XREF, i.e. the VSUT neither passes nor fails these requirements. VPP Step 8.1 Total Completion Score (TCS) It has been determined that the following Total Completion Scores represent typical performance of the established calibration systems. System X previous results (X_nominal): nominal TCS = 100/102 = 0.9804 System X current results (X_current): current TCS = 100/105 = 0.9524 If the 95% confidence interval (CI) for the current results from the calibration system do not contain the appropriate value, then the results of this execution of the VPP are invalid and must be ignored. For example, suppose you are using System X (with nominal results X_nominal TCS = 0.9804) as the calibration system and the current results (X_current) are 100/105 (i.e. 100 successes in 105 attempts). Using the Adjusted Wald Method (click for online calculator), we find the 95% CI for this score to be [0.8906, 0.9823]. Since this CI contains the target value of 0.9804, there is no strong reason to assume that the current procedure is aberrant. However, a current TCS of 100/106 yields a CI of [0.8795, 0.9763], which would indicate lack of validity because it does not contain the target value. There are Perl scripts to support the TCS calculation. VPP Step 8.2 Mann-Whitney Analysis of Raw Scores. The Mann-Whitney test compares the distribution of nominal raw scores against the distribution of current scores. If these are sufficiently similar, the test procedure is assumed to be valid. The test involves computing a so-called "U" score as a result of comparing the distributions. Take all pairs of scores (one from the current set and one from the nominal set). For each pair in which the current score is less than the nominal score, add 1 to U. For each pair in which the scores are equal, add 1/2 to U. The mean and standard deviation for U can be used to derive a z-score: Let ns = number of nominal scores Let cs = number of current scores U_mean = ns * cs / 2; ns * cs * (ns + cs + 1) U_std_dev = sqrt ( ------------------------- ) 12 z-score = (U - U_mean) / U_std_dev If the z-score is outside the normal 95% CI (that is, not between -1.96 and +1.96), the two distributions are different enough to indicate an invalid test. For instance, suppose that we have exactly 100 values for both distributions and that we compute U = 5739.5, as described above. The mean for U is 5000, and the standard deviation is 409.27, yielding a z-score of 1.8069. Since this lies within the normal CI of [-1.96, 1.96], the test may be assumed to be valid. There are Perl scripts to support the Mann-Whitney calculation. VPP Step 9. Analyze data After ensuring that the results from the calibration system do not indicate invalid test results, we now analyze the results from the VSUT and compare them against the benchmarks set out in section 3.2.1.1 XREF of the VVSG. VPP Step 9.1 Effectiveness: Total Completion Score (TCS) The TCS is calculated simply as the ratio of the number of participants who successfully cast a ballot to the number of participants who attempted to vote on the system, i.e. TCS = NCAST / NPART. For instance, if 106 subjects attempted to vote, and 4 failed to cast the ballot, the TCS = 102/106 = 0.9623, and the associated 95% CI = [0.9039, 0.9883]. This CI is derived using the Adjusted Wald formula, which you can compute by entering the numerator and denominator into this online calculator or by using this Perl script. Of course, the Perl script also allows you to inspect the details of the computation. FP => If the high end of the TCS CI is less than the benchmark value of 98%, then, for requirement "Total completion performance", the system fails, otherwise it passes. Note that since this (and the following tests) are "one-sided" (failure occurs only if the benchmark is on the high side of the CI), it is even more conservative than implied by the figure of 95% for the CI. The probability of "false" failure is at most 2.5%. And of course, the farther the true value is below the benchmark, the lower that probability. The benchmark calculation Perl scripts (as described here), perform the TCS calculation. VPP Step 9.2 Effectiveness: Perfect Ballot Index (PBI) The PBI is the ratio of the number of cast ballots containing no erroneous votes (i.e. raw score = 28) to the number of cast ballots containing one or more errors (raw score < 28), i.e. PBI = NPERFECT / (NCAST - NPERFECT). In the following example, let us assume there are 60 perfect ballots and 40 imperfect ballots - a measured PBI of 1.5 (60 / 40). Apply the Adjusted Wald formula to the number of perfect ballots (successes = 60) and the number of all cast ballots (100). Let H = the high end of the resulting 95% CI (0.6907). Therefore the high end of the CI is equivalent to a PBI of H / (1 - H) = 0.6907 / 0.3093 = 2.233. FP => If the high end of the PBI CI is less than the benchmark value of 2.33, then, for requirement "Perfect ballot performance", the system fails, otherwise it passes. The benchmark calculation Perl scripts (as described here), perform the PBI calculation. VPP Step 9.3 Effectiveness: Voter Inclusion Index (VII) The VII is based on the set of accuracy scores (PCORRECT-i) for all the participants who cast their ballots. First compute the mean (VII_M) and standard deviation (VII_SD) for all the PCORRECT-i scores. The VII is calculated as a capability index (see "What is Process Capability?" ). Set VII = (VII_M - 0.85) / (3 * VII_SD). We calculate the high end of the 95% CI by adding to VII: 1 VII ** 2 1.96 * sqrt ( --------- + ------------- ) 9 * NCAST 2 * (NCAST-1) For example, suppose we had 100 cast ballots with an average accuracy of 0.93 and a standard deviation of 0.11. The measured VII would then be 0.242, with a 95% CI of [0.169, 0.316]. Since the entire CI is below the benchmark of 0.35, the system in this example would fail. FP => If the high end of the VII CI is less than the benchmark value of 0.35, then, for requirement "Voter inclusion performance", the system fails, otherwise it passes. The benchmark calculation Perl scripts (as described here), perform the VII calculation. VPP Step 9.4 Efficiency: Average Time on Task We consider only those participants who successfully cast their ballots. The average time on task is calculated simply as the sum of times taken (TASKTIME-i) divided by their number (NCAST). VPP Step 9.5 Satisfaction: Confidence and Likability There are two satisfaction metrics, measured using a Likert scale of 1-5. These are both calculated as simple means: the sum of scores for all those who cast ballots, divided by NCAST. VPP Step 10. Report system results "Usability metrics from the Voting Performance Protocol" (3.2.1.1-D XREF) and its sub-requirements apply to the test lab, not to the VSUT. Therefore, it is not a pass/fail requirement. Rather, it directs the test lab to report all the metrics described above for effectiveness, efficiency, and satisfaction to the EAC as part of the test report. Report items should include: Calibration system data - current results Identification (make/model/version) of the calibration system NPART and NCAST Measured TCS = NCAST / NPART 95% CI for TCS Distribution of raw scores Z-score resulting from Mann-Whitney comparison of current and nominal distributions Effectiveness Metrics for the VSUT NPART and NCAST Measured TCS = NCAST / NPART 95% CI for TCS Distribution of raw scores NPERFECT Measured PBI = NPERFECT / (NCAST - NPERFECT) 95% CI for PBI Mean and standard deviation for the distribution of scaled scores (PCORRECT-i) Measured VII 95% CI for VII Efficiency Metrics for the VSUT Mean for distribution of TASKTIME-i Standard deviation for distribution of TASKTIME-i (optional) Satisfaction Metrics for the VSUT Mean for distribution of confidence responses Mean for distribution of likability responses Test Method: Usability Testing by Manufacturer Covers requirements: 3.2.1.2-A Usability Testing by Manufacturer for General Population 3.2.7-A.4 Usability Testing by Manufacturer for Alternative Languages 3.2.8.1-B Usability Testing by Manufacturer for Poll Workers 3.3.2-A Usability Testing by Manufacturer for Voters with Low Vision 3.3.3-A Usability Testing by Manufacturer for Blind Voters 3.3.4-A Usability Testing by Manufacturer for Voters with Dexterity Disabilities A usability expert who is familiar with the Common Industry Format (CIF) shall examine the TDP to ensure the existence and adequacy of the test report submitted by the manufacturer. The expert shall verify that the report conforms to the formatting and content requirements of the CIF. The expert shall verify that the demographic characteristics of the subject pool meet the specifications of the particular requirement. Note that there are no requirements pertaining to the quantitative results of the test. Most of the usability tests are oriented towards voters, and accordingly, the tasks within the test must encompass some voting activity. The usability tests for poll workers must encompass setup, operation, and shutdown of the system. Unlike other CRTMs, this is not a test method that gets executed once in order to cover several requirements, but rather a test method that gets executed once per requirement. F => If the formatting or content does not conform to the CIF, then, for requirement "Usability Testing by Manufacturer", the system fails. F => If the subject pool does not conform to the required demographic characteristics, then, for requirement "Usability Testing by Manufacturer", the system fails. F => If the tasks are not relevant (for voters or poll workers, as appropriate), then, for requirement "Usability Testing by Manufacturer", the system fails. Test Method: Editable Ballot Session Covers requirements: 3.2.2-D Notification of Ballot Casting 3.2.2.1-A Prevention of Overvotes 3.2.2.1-B Warning of Undervotes 3.2.2.1-C Independent Correction of Ballot 3.2.2.1-D Ballot Editing per Contest 3.2.2.1-E Contest Navigation If the VSUT has an audio interface (i.e. within class VEBD-A), this test method must be enacted for both the visual and audio interface. The tester shall fill out the ballot using the default ballot choices, except as follows: While voting contest #2 (US Senate), the tester shall first indicate a vote for Dennis Weiford and then change it to Lloyd Garriss and this change must be possible before advancing to the next contest. Just before voting in contest #8 (State Assemblyman), navigate sequentially backward to contest #5 (Lieutenant-Governor), and then forward to contest #8 again. It should be possible to see and modify the votes cast in contests #5, #6, and #7, (Lieutenant-Governor, Registrar of Deeds, and State Senator). No votes are to be indicated in contest #11 (Water Commissioners). PF => If the editing within contest #2 can be performed, then, for requirement "Ballot Editing per Contest", the system passes, otherwise it fails. PF => If the navigation among contests #5, 6, 7, and 8 can be performed, then, for requirement "Contest Navigation", the system passes, otherwise it fails. After initial completion of the ballot, the tester shall attempt to add a vote for John Hewetson in contest #2 (US Senate). This must be done without "clearing" the prior vote for Lloyd Garriss. The system may either refuse to accept the new vote or may change the selection from Garriss to Hewetson, but may not indicate a vote for both. Then the tester shall attempt to add a vote for Harvey Eagle in contest #12 (City Council). Again, the system may either refuse the new selection or change an old one, but it may not indicate the addition of a 5th vote. FP => If the system at any point indicates more votes within a contest than allowed, then, for requirement "Prevention of Overvotes", the system fails, otherwise it passes. The tester shall then attempt to add a vote for candidate Orville White in contest #11 (Water Commissioner) and then proceed to a point just prior to final casting of the ballot. F => If by this time no warning has been given about undervoting in contest #11, then, for requirement "Warning of undervotes", the system fails. If there has been a warning, return to contest #11 and add a vote for Gregory Seldon, so that the contest is no longer undervoted. Then withdraw the vote for Sheila Moskowitz in contest #9 (County Commissioner) and again proceed to the point just prior to final casting of the ballot. F => If by this time no warning has been given about undervoting in contest #9, then, for requirement "Warning of undervotes", the system fails. The tester shall then attempt to change the vote in contest #7 (state senate) from Marty Talirico to Edward Shiplett. F => If this change (or any of the previous changes) cannot be done autonomously, then, for requirement "Independent Correction of Ballot", the system fails. Finally, the tester shall follow the system's instructions so as to cast the ballot (including final review and/or verification, as available). Upon doing so, the system must notify the voter that the ballot has been cast successfully. PF => If the system notifies the voter that the ballot was cast, then, for requirement "Notification of Ballot Casting", the system passes, otherwise it fails. Test Method: Non-Editable Ballot Session Covers requirements: 3.2.2-D Notification of Ballot Casting 3.2.2.2-A Notification of Overvoting 3.2.2.2-B Notification of Undervoting 3.2.2.2-D Ballot Correction or Submission Following Notification There are three sub-tests to be carried out in succession. The tester shall set up the VSUT to be in each of these states: Warn about all overvoting and all undervoting Warn about all overvoting and undervoting only for City Council (contest #12) Warn about all overvoting but not undervoting F => If the system cannot configured to these three states, then, for requirement "Notification of Undervoting", the system fails. For each of these conditions, the tester shall fill out the ballot in the standard way, except as follows: When voting contest #2 (US Senate), the tester shall indicate a vote for both Dennis Weiford and Lloyd Garriss (overvote). When voting contest #8 (State Assemblyman), the tester shall not indicate a vote for any candidate (undervote). When voting contest #11 (Water Commissioner), the tester shall write in a vote for Bob Johnson, as well as voting for both Orville White and Gregory Seldon (overvote). When voting contest #12 (City Council), the tester shall vote only for Donald Davis, Hugh Smith, and Reid Feister (undervote). When voting Retention Question #1 (Retain Robert Demergue as Chief Justice) the tester shall mark neither the "yes" nor "no" boxes (undervote). When voting Referendum #2 (PROPOSED CONSTITUTIONAL AMENDMENT D) the tester shall mark both the "yes" and "no" boxes (overvote). When voting Referendum #4 (PROPOSED CONSTITUTIONAL AMENDMENT K) the tester shall mark neither the "yes" nor "no" boxes (undervote). For each of the three sub-tests, the system must issue the appropriate warnings. It must always warn about overvoting in exactly these contests: Contest #2 (US Senate), Contest #11 (Water Commissioner), Referendum #2 (PROPOSED CONSTITUTIONAL AMENDMENT D) F => If the system does not consistently warn about all three of these contests being overvoted, then, for requirement "Notification of Overvoting", the system fails. For sub-test #1 (all undervote warnings are enabled) it must warn about undervoting in exactly these contests: Contest #8 (State Assemblyman), Contest #12 (City Council), Retention Question #1 (Retain Robert Demergue as Chief Justice) Referendum #4 (PROPOSED CONSTITUTIONAL AMENDMENT K) F => If the system does not issue undervote warnings for exactly these four contests, then, for requirement "Notification of Undervoting", the system fails. For sub-test #2 (undervote warning for contest #12 only) it must warn about undervoting for just that contest. F => If the system does not issue an undervote warning for exactly that contest, then, for requirement "Notification of Undervoting", the system fails. For sub-test #3 (all undervote warnings are disabled) it must not issue any undervote warnings. F => If the system issues any undervote warning for sub-test #3, then, for requirement "Notification of Undervoting", the system fails. At the conclusion of each sub-test, the tester shall attempt final casting of the ballot. The system must then give the tester the opportunity to correct his/her ballot. Typically, an optical scanner would return the paper ballot for correction, although other mechanisms may be possible. F => If no such opportunity for ballot correction is given, then, for requirement "Ballot Correction or Submission Following Notification", the system fails. If allowed to correct, the tester shall mark the "yes" box for Referendum #4 (PROPOSED CONSTITUTIONAL AMENDMENT K), so as to correct that one undervote, and then re-submit the ballot. For each sub-test, the system must again warn about the uncorrected overvotes as above. F => If the system does not consistently warn about the three overvoted contests, then, for requirement "Notification of Overvoting", the system fails. For the first two sub-tests, the system must again warn about the uncorrected undervotes as above. That is, in sub-test #1, it must warn about Contest #8 (State Assemblyman), Contest #12 (City Council), and Retention Question #1; and in sub-test #2 it must warn about Contest #12 (City Council) only. F => If the system does not warn about the uncorrected undervotes in sub-test #1 and #2, then, for requirement "Notification of Undervoting", the system fails. F => If the system issues any undervote warning for sub-test #3, then, for requirement "Notification of Undervoting", the system fails. The tester shall then attempt to submit his/her ballot without further correction (i.e. for all sub-tests, there is only one attempt to correct). F => If the system refuses to accept final casting of the ballot, then, for requirement "Ballot Correction or Submission Following Notification", the system fails. PF => If system notifies the voter that the ballot has been cast successfully, then, for requirement "Notification of Ballot Casting", the system passes, otherwise it fails. Test Method: Privacy of Voting Session Covers requirements: 3.2.3.1-A System Support of Privacy 3.2.3.1-A.1 Visual Privacy 3.2.3.1-A.2 Auditory Privacy 3.2.3.1-A.3 Privacy of Warnings 3.2.3.1-A.4 No Receipts The system shall be set up using a layout compatible with the manufacturer's instructions. The layout includes the position and orientation of the equipment in relation to other polling place activity, such as a check-in desk, location of poll workers and judges, and of waiting voters. This test requires two testers, a "voter" and a "bystander". The "voter" shall proceed through an entire voting session. For editable interfaces, use the editable ballot session above, for non-editable interfaces, use the non-editable ballot session (with warnings enabled for overvote and undervote). Note that the latter potentially includes submitting a ballot to a scanner and then correcting it and re-submitting. The voter should follow the instructions for voting as given by the system, including e.g. procedures for changing a ballot or for the use of a privacy sleeve. The point is to see whether privacy is violated even if the voter acts conscientiously. In the case of an Acc-VS, the session must be enacted three times, using: the conventional visual-tactile interface the audio interface the synchronized audio/visual interface with wheelchair access and the non-manual controls provided for voters with dexterity disabilities. The "bystander" shall approach the voting station as closely as would typically be allowed in a polling place environment, if the bystander were an election official or another voter. A bystander would typically not be allowed to stand right next to the voter. The bystander attempts to determine any of the "voter's" choices through either visual or auditory cues. This attempt continues throughout the entire voting session, including ballot verification (e.g. as with a VVPAT system) and casting (not just when the voter is at the voting station). FP => If the "bystander" can discover any voter choices via visual cues, then, for requirement "Visual Privacy", the system fails, otherwise it passes. FP => If the "bystander" can discover any voter choices via auditory cues, then, for requirement "Auditory Privacy", the system fails, otherwise it passes. FP => If the "bystander" can discover any voter choices via warnings, then, for requirement "Privacy of Warnings", the system fails, otherwise it passes. FP => If the "bystander" can discover any voter choices by any other plausible means, then, for requirement "System Support of Privacy", the system fails, otherwise it passes. FP => If the system issues a receipt whereby a voter could prove to another party how he or she voted, then, for requirement "No Receipts", the system fails, otherwise it passes. Test Method: Privacy of Cast Vote Record (CVR) Covers requirements: 3.2.3.2-A No Recording of Alternative Languages 3.2.3.2-B No Recording of Accessibility Features This test should be run after the voting sessions that test for alternative languages (XREF Section 3.2.7) and for access by blind voters (XREF section 3.3.3). Also, if there is no electronic CVR for the system, then this test does not apply. The tester shall examine the TDP for the system and determine the format of the electronic Cast Vote Record (CVR) to ensure that no accessibility data or alternative language data is part of the CVR design. F => If the format of the CVR includes information on the language used by the voter, then, for requirement "No Recording of Alternate Languages", the system fails. F => If the format of the CVR includes information on the accessibility features used by the voter, then, for requirement "No Recording of Accessibility Features", the system fails. The tester shall examine a representation of the CVR generated by other voting sessions that tested for alternative languages and for access by blind voters, in which such data was potentially generated, and verify that such data was not recorded. FP => If alternative language data was preserved in the CVR, then, for requirement "No Recording of Alternate Languages", the system fails, otherwise it passes. FP => If accessibility data was preserved in the CVR, then, for requirement "No Recording of Accessibility Features", the system fails, otherwise it passes. Test Method: Language Clarity Covers requirements: 3.2.4-C Plain Language 3.2.4-C.1 Clarity of Warnings 3.2.4-C.2 Context before Action 3.2.4-C.3 Simple Vocabulary 3.2.4-C.4 Start Each Instruction on a New Line 3.2.4-C.5 Use of Positive 3.2.4-C.6 Use of Imperative Voice 3.2.4-C.7 Gender-based Pronouns This section describes the assessment of language usage in voting system documentation for the VVSG tests. The Guidelines for Writing Clear Instructions and Messages for Voters and Poll Workers provide a basis for determining whether a given system's documentation is written at a professionally-recognized level of quality. Two experts in the use of plain language shall proceed through an entire voting session and check the clarity of instructions and warnings intended for the voter. General principles and best practices known to the experts shall be used as criteria, as well as the sub-requirements of 3.2.4-C XREF. Note that these sub-requirements are all recommendations ("should") and are not to be treated as absolutes. E.g. one instance of the use of passive voice does not necessarily ensure failure of the (mandatory) main requirement. For editable interfaces, use the editable ballot session above, for non-editable interfaces, use the non-editable ballot session (with warnings enabled for overvote and undervote). If the VSUT has an audio interface (i.e. within class VEBD-A), the editable ballot session must be enacted for both the visual and audio interface. In a real election, some of the messages intended for the voter originate with the voting system and some are mandated by election law. The VVSG requirements apply only to the former. However, in this testing situation, the system implements the NIST standard test ballot specification, which does not place constraints on the wording to be used for instructions and warnings and so these are all subject to scrutiny. FP => If the system instructions are unclear enough that voters would have significant difficulty understanding warnings, notices, or instructions, then, for requirement "Plain Language", the system fails, otherwise it passes. As the language review is taking place, the experts should also note violations of the detailed sub-requirements, even though these are not mandatory. Warnings and alerts issued by the voting system should state: a. the nature of the problem; b. whether the voter has made a mistake or whether the voting system itself has malfunctioned; and c. the set of responses available to the voter. PF => If all warnings or alerts clearly address these three aspects, then, for requirement "Clarity of Warnings", the system passes, otherwise it fails. PF => If system instructions first state the condition, and then the action to be taken, then, for requirement "Context before Action", the system passes, otherwise it fails. PF => If system instructions use familiar words and avoid technical or specialized words, then, for requirement "Simple Vocabulary", the system passes, otherwise it fails. PF => If each logically distinct instruction starts on a new line, then, for requirement "Start Each Instruction on a New Line", the system passes, otherwise it fails. PF => If system instructions generally state what to do, rather than what to avoid, then, for requirement "Use of Positive", the system passes, otherwise it fails. PF => If system instructions directly address the voter, then, for requirement "Use of Imperative Voice", the system passes, otherwise it fails. PF => If system instructions avoid the use of gender-specific pronouns, then, for requirement "Gender-based Pronouns", the system passes, otherwise it fails. Test Method: Ballot Design Covers requirements: 3.2.4-E Ballot Design 3.2.4-E.1 Contests Split among Pages or Columns 3.2.4-E.2 Indicate Maximum Number of Candidates 3.2.4-E.3 Consistent Representation of Candidate Selection 3.2.4-E.4 Placement of Instructions 3.2.4-F Conventional Use of Color 3.2.4-G Icons and Language Two experts in the use of ballot design shall proceed through an entire voting session and check the ballot design. General principles and best practices known to the experts shall be used as criteria, as well as the sub-requirements of 3.2.4-E XREF and requirements 3.2.4-F XREF and 3.2.4-G XREF. Note that some of these sub-requirements of 3.2.4-E XREF are mandatory and some are recommendations. For editable interfaces, use the editable ballot session above, for non-editable interfaces, use the non-editable ballot session (with warnings enabled for overvote and undervote). If the VSUT has an audio interface (i.e. within class VEBD-A), the editable ballot session must be enacted for both the visual and audio interface. Note that certain sub-requirements (e.g. "Contests split among pages or columns") apply only to a visual interface. In a real election, some aspects of ballot design originate with the voting system and some are mandated by election law. The VVSG requirements apply only to the former. However, in this testing situation, the system implements the NIST standard test ballot specification, which does not place constraints on ballot design, and so the full design is subject to scrutiny. After proceeding through the voting session, the experts determine the kind and severity of ballot design problems exhibited by the system. F => If there are ballot design problems serious enough that voters would have significant difficulty understanding and executing the ballot, then, for requirement "Ballot Design", the system fails. F => If any of the mandatory sub-requirements of 3.2.4-E XREF are not met, then, for requirement "Ballot Design", the system fails. It is expected that all the contests in the NIST standard test ballot specification (except contest #4 for Governor) will fit on a single page or screen. PF => If all the contests except for Governor are presented on a single page or screen, then, for requirement "Contests Split among Pages or Columns", the system passes, otherwise it fails. PF => If every contest clearly indicates the maximum number of choices for which one can vote, then, for requirement "Indicate Maximum Number of Candidates", the system passes, otherwise it fails. PF => If all contests maintain the same relationship between the name of a candidate and the mechanism used to vote for that candidate, then, for requirement "Consistent Representation of Candidate Selection", the system passes, otherwise it fails. PF => If ballot instructions are placed near to where they are needed by the voter, then, for requirement "Placement of Instructions", the system passes, otherwise it fails. PF => If all uses of color within the ballot conform to common conventions, then, for requirement "Conventional Use of Color", the system passes, otherwise it fails. PF => If every ballot icon is accompanied by a corresponding linguistic label, then, for requirement "Icons and Language", the system passes, otherwise it fails. Test Method: Default Characteristics Covers requirements: 3.2.5-B Resetting of Adjustable Aspects at End of Session 3.2.5-C Ability to Reset to Default Values This test involves as many as six display characteristics. Characteristic System Class Font size VEBD-V Contrast VEBD-V Audio volume VEBD-A Rate of speech VEBD-A Color saturation Acc-VS Synch Audio/Video Mode Acc-VS The tester shall proceed through a voting session using the default ballot choices, vote in the first three contests, and note the initial appearance (audio as well as visual) of each of the applicable characteristics listed above. Set the system to full audio/video mode if available. Before voting in the fourth contest, the tester shall change the font size, audio volume, and color saturation, as available. The tester shall then vote in the 4th contest and move on to the 5th. The tester then activates the mechanism provided to reset all the adjustable characteristics, and finally inspects the current appearance of all the applicable characteristics. F => If the current appearance of any characteristic does not match its initial appearance, then, for requirement "Ability to Reset to Default Values", the system fails. After voting the 5th and 6th contest, the tester shall change the contrast and rate of speech as available. The tester shall then vote in the 7th contest. The tester then activates the mechanism provided to reset all the adjustable characteristics, and again inspects the current appearance of all the applicable characteristics. F => If the current appearance of any characteristic does not match its initial appearance, then, for requirement "Ability to Reset to Default Values", the system fails. Finally, if applicable, the tester shall change the synchronized audio/visual mode (e.g. if the default mode is full audio/visual, change to visual-only) and vote the 8th contest. The tester then activates the mechanism provided to reset all the adjustable characteristics, and again inspects the current appearance of all the applicable characteristics. F => If the current appearance of any characteristic does not match its initial appearance, then, for requirement "Ability to Reset to Default Values", the system fails. The tester votes the 10th contest and then changes the font size, audio volume, color saturation, and synchronized audio/visual mode, as available. The tester then completes the voting session, leaving these characteristics in their non-default state. The tester then initiates a second voting session and proceeds until the first contest (straight party) is displayed. The tester inspects the current appearance of all the applicable characteristics. FP => If the current appearance of any characteristic does not match its initial appearance as noted in the first session, then, for requirement "Resetting of Adjustable Aspects at End of Session", the system fails, otherwise it passes. Test Method: Font Characteristics Covers requirements: 3.2.5-D Minimum Font Size 3.2.5-F Use of Sans Serif Font The tester shall proceed through an entire voting session (whether the system uses an electronic interface or an MMPB), using the default ballot choices. On each page, the tester shall measure (using a 15x magnifier) the height of capital letters in the smallest text intended for the voter. This includes any voter information, even if not part of the ballot, e.g. a page of system instructions to be posted in the voting booth. F => If any such capital letter has a height less than 3.0mm, then, for requirement "Minimum Font Size", the system fails. Also, on each page, the tester shall examine the font used for any text intended for the voter. PF => If all the text intended for the voter in presented in sans serif font, then, for requirement "Use of Sans Serif Font", the system passes, otherwise it fails. The tester must identify any text intended for use by poll workers. This includes such items as setup and operation manuals, quick setup or troubleshooting sheets, and labels and instructions affixed to the equipment. For each identified item, the tester shall measure (using a 15x magnifier) the height of capital letters in the smallest text intended for poll workers. F => If any such capital letter has a height less than 3.0mm, then, for requirement "Minimum Font Size", the system fails. Test Method: Use of Color Covers requirements: 3.2.5-J Accommodation for Color Blindness 3.2.5-K No Reliance Solely on Color This section describes the assessment of color usage for the VVSG tests. The Guidelines for Using Color in Voting Systems provide a basis for determining whether a given system's color usage is at a professionally-recognized level of quality. The review is performed by two experts in color vision. The experts shall proceed through a voting session using the default ballot choices, except that no vote is to be cast for Governor, so as to cause an undervote warning. The experts note any use of color beyond a simple monochrome presentation. This review applies to electronic displays and also to paper presentations, including paper ballots. The review also includes controls (such as knobs or buttons), instructions, and warnings, as well as ballot contents. The experts shall look for examples of information presentation that might be confusing to voters with common types of colorblindness, especially protonopia and deutronopia. PF => If all presentations are judged to be readily comprehensible to colorblind voters, then, for requirement "Accommodation for Color Blindness", the system passes, otherwise it fails. The experts shall also look for examples of presentations in which color is the exclusive means of conveying information. Use of multiple colors for text is acceptable, since the text itself conveys information. Likewise, colored icons are acceptable as long as they are otherwise distinguishable by shape or accompanying text. Examples of violation of this requirement would include icons or controls distinguishable only by color, such as the use of simple green and red buttons. PF => If no presentations are found that rely solely on color, then, for requirement "No Reliance on Color", the system passes, otherwise it fails. Test Method: Scrolling and Feedback Covers requirements: 3.2.6-A No Page Scrolling 3.2.6-B Unambiguous Feedback for Voter's Selection The tester shall proceed through an entire voting session using the default ballot choices. For VEBD-V systems, the tester shall observe whether page scrolling is available. Page scrolling means that there are "off-screen" contents that can be made visible through the use of scroll bars or other mechanisms. Page scrolling is an operation on a single page and is not to be confused with simply advancing through several pages of information. FP => If the system uses page scrolling, then, for requirement "No Page Scrolling", the system fails, otherwise it passes. The tester shall also observe whether the selection of candidates and choices is conspicuously and unmistakeably indicated by the system. Examples of acceptable feedback for a visual system would be an "X" or checkmark next to the chosen option or the use of highlighting around the chosen option. F => If the visual feedback mechanism does not clearly indicate voter choices, then, for requirement "Unambiguous Feedback for Voter's Selection", the system fails. For VEBD-A systems, the tester must re-enact the voting session, using the audio interface and again observe whether the selection of candidates and choices is conspicuously and unmistakeably indicated by the system. E.g. a spoken confirmation, such as "You have selected John Smith" would be acceptable. F => If the audio feedback mechanism does not clearly indicate voter choices, then, for requirement "Unambiguous Feedback for Voter's Selection", the system fails. Test Method: Accidental Activation Covers requirements: 3.2.6-C Accidental Activation 3.2.6-C.1 Size and Separation of Touch Areas 3.2.6-C.2 No Repeating Keys If the VSUT is an Acc-VS, this test method must be enacted for both the ordinary and accessible controls (those designed for voters with dexterity disabilities). A tester with usability expertise shall proceed through an entire voting session using the default ballot choices. The tester shall note whether any controls or touch areas on the screen are unusually sensitive or are located so as to be susceptible to unintentional contact (e.g. some voters tend to grip a screen at its lower corners). The tester judges whether the voter has a significant chance of accidently activating one of the controls. F => If the system presents significant vulnerabilities for accidental activation, then, for requirement "Accidental Activation", the system fails. For touchscreen systems, The tester shall examine the touch areas for at least contests #4 (Governor) and #9 (County Commissioners). Using a ruler to measure distance and a stylus to perform the touching, the tester shall determine first that the touch areas used to vote for at least the first three candidates in each contest are separated as required. F => If any vertical distance between centers of adjacent touch areas for voting is less than 0.6 inches, then, for requirement "Size and Separation of Touch Areas", the system fails. F => If any horizontal distance between centers of adjacent touch areas for voting is less than 0.8 inches, then, for requirement "Size and Separation of Touch Areas", the system fails. Then the tester shall determine that the size of the sensitive touch area for each of these candidates is at least of the size required. F => If the size of any touch area for voting is less than 0.5 inches high or 0.7 inches wide, then, for requirement "Size and Separation of Touch Areas", the system fails. The tester shall attempt a write-in vote for county commissioner. If the write-in mechanism is an on-screen keyboard, then the tester shall determine that the letter keys also meet the size and separation requirements. F => If any vertical distance between centers of adjacent touch areas for write-ins is less than 0.6 inches, then, for requirement "Size and Separation of Touch Areas", the system fails. F => If any horizontal distance between centers of adjacent touch areas for write-ins is less than 0.8 inches, then, for requirement "Size and Separation of Touch Areas", the system fails. F => If the size of any touch area for write-ins is less than 0.5 inches high or 0.7 inches wide, then, for requirement "Size and Separation of Touch Areas", the system fails. The tester shall proceed through the voting session and note the effect of holding any manual control in place, including letter keys on a keyboard, "next page" or "previous page" icons on a touch screen, control buttons, or joysticks and verify that none of them have a repetitive effect (e.g. holding down a "next page" control should not cause the system to advance through several pages). Particular attention should be paid to controls on an Acc-VS intended for use by voters with dexterity disabilities. FP => If a control with a repetitive effect is found, then, for requirement "No Repeating Keys", the system fails, otherwise it passes. Test Method: Response Time Covers requirements: 3.2.6.1-A Maximum Initial System Response Time 3.2.6.1-B Maximum Completed System Response Time for Vote Confirmation 3.2.6.1-C Maximum Completed System Response Time for All Operations 3.2.6.1-D System Response Indicator This test requires the use of a video system with an accurate on-screen timer to record the voting session. The timer must have a precision of at least 0.1 seconds. If the VSUT is of type VEBD-A (audio interface), this test method must be enacted for both the visual and audio interface. The tester shall proceed through the voting session using the editable ballot session described above, as the interaction with the system is recorded. The recording should capture screen events and also capture audio. Initial and completed response times, and the timing of a system activity indicator, shall be measured for at least the following events: Initial activation of the ballot Selecting a candidate Changing a candidate selection Transition to the next page Transition to a previous page Typing in the letters for a write-in candidate Completion of typing in a write-in candidate Final casting of the ballot The tester must make some allowance for the sensitivity of controls. For instance, a touch area on a screen, or other controls may not respond until a certain amount of pressure has been exerted. FP => If the system's initial response time for any of the events is greater than 0.5 seconds, then, for requirement "Maximum Initial System Response Time", the system fails, otherwise it passes. F => If the system's visual completed response time for selection of a candidate exceeds 1.0 seconds, then, for requirement "Maximum Completed System Response Time for Vote Confirmation", the system fails. F => If the system's audio completed response time for selection of a candidate exceeds 5.0 seconds, then, for requirement "Maximum Completed System Response Time for Vote Confirmation", the system fails. FP => If the system's visual completed response time for any event exceeds 10.0 seconds, then, for requirement "Maximum Completed System Response Time for All Operations", the system fails, otherwise it passes. F => If the system's completed visual response time for any event is greater than 1.0 second, but no system activity indicator appears within 0.5 seconds, then, for requirement "System Response Indicator", the system fails. Test Method: Inactivity Time Covers requirements: 3.2.6.1-E Voter Inactivity Time 3.2.6.1-F Alert Time If the VSUT is of type VEBD-A (audio interface), this test method must be enacted for both the visual and audio interface. The tester shall determine the voter inactivity time from the system documentation. F => If the voter inactivity time is not documented, then, for requirement "Voter Inactivity Time", the system fails. F => If the voter inactivity time is documented as less than two minutes or greater than five minutes, then, for requirement "Voter Inactivity Time", the system fails. If a valid voter inactivity time is not documented, the test method may be terminated. Otherwise, the tester proceeds through the voting session up to contest #3 (US Representative). At that point, the tester ceases interaction with the system and begins timing the duration until the system issues an inactivity alert. F => If the measured inactivity time is not within 5% of the documented inactivity time, then, for requirement "Voter Inactivity Time", the system fails. Within five seconds after the alert, the voter shall cast a vote in contest #3 and verify that the system is now active again, without the need for intervention by a poll worker. F => If the system cannot be re-started by the voter, then, for requirement "Alert Time", the system fails. After proceeding to contest #5 (Lieutenant-Governor), the tester shall again cease interaction with the system and again verify that the inactivity alert is given after the appropriate interval. F => If the measured inactivity time is not within 5% of the documented inactivity time, then, for requirement "Voter Inactivity Time", the system fails. The tester shall continue inactivity and measure the "alert" time from when the inactivity alert was given until the system goes into an inactive state (i.e. does not respond to normal voter interaction). F => If the alert time is less than 20 seconds or greater than 45 seconds, then, for requirement "Alert Time", the system fails. Test Method: Alternative Languages Covers requirements: 3.2.7-A General Support for Alternative Languages 3.2.7-A.1 Voter Control of Language 3.2.7-A.2 Complete Information in Alternative Language 3.2.7-A.3 Auditability of Records for English Readers The tester shall examine the TDP of the system and determine the set of alternative languages for which the manufacturer claims support. For each such language, if the primary tester is not fluent in that language, there shall be an adjunct tester who is fluent. This test requires two systems, one to serve as the "base" English system (A), and the other to serve as the alternative language system (B). For all systems (other than audio-only, such as a vote-by-phone system), the test method must be enacted in visual mode for each written alternative language. In addition, if the VSUT is of type VEBD-A, the above test method must be enacted in audio mode once for each alternative language, written or unwritten. Note that the Acc-VS supports both visual and audio modes. The tester(s) shall begin the voting session, using the English interface on system A and the alternative language interface on system B. Systems A and B are to be run "in parallel" to allow for comparison of the English and alternative presentation. For editable interfaces, use the editable ballot session above, for non-editable interfaces, use the non-editable ballot session (with warnings enabled for overvote and undervote). If system B is being tested in audio mode, system A should also be set to audio mode. For VEBD systems only: After completing the selection of a candidate for Governor (contest #4), the tester shall switch system B back to English. F => If system B cannot be switched to English, then, for requirement "Voter Control of Language", the system fails. Review all the ballot choices made in the first four contests on system B. F => If any of the choices already made have been altered, then, for requirement "Voter Control of Language", the system fails. After completing the selection of a candidate for State Senator (contest #7), the tester shall again switch languages, changing system system B back to the alternative language. F => If system B cannot be switched back to the alternative language, then, for requirement "Voter Control of Language", the system fails. The tester shall review the first seven contests to verify that ballot choices have been preserved, F => If any of the choices already made have been altered, then, for requirement "Voter Control of Language", the system fails. End of procedure for VEBD systems. Throughout the session, the tester(s) shall verify that no knowledge of English is necessary to successfully operate system B when it is in non-English mode. This includes ballot activation, selection of choices, review, verification, and ballot casting. Candidate names, however, may be presented in conventional Roman fonts . F => If any operation of system B requires knowledge of English, then, for requirement "General Support for Alternative Languages", the system fails. The tester(s) shall verify that all instructions, warnings, VVPAT material, and other text intended for the voter produced by the English system A are also produced correctly by the alternative language system B. Examples include: Instructions and feedback on initial activation of the ballot (such as insertion of a smart card) Instructions and feedback to the voter on how to operate the voting station, including settings and options (e.g. font size, volume control) Instructions and feedback for navigation of the ballot Instructions and feedback for contest choices, including maximum number to vote for and how to write-in candidates Instructions and feedback on confirming and changing ballot choices Instructions and feedback on final verification and casting of the ballot PF => If System B provides all the information in the alternative language as provided in English by system A, then, for requirement "Completeness of Information", the system passes, otherwise it fails. After completion of the session, the tester shall examine records intended for use in an audit, including paper and electronic, as appropriate. This may require "opening up" the system and going through poll closing procedures so as to gain access to the audit records. Verify that these records are intelligible to English-only readers/auditors. In particular, paper verification records must present information in both the alternative language (so as to be accessible to the voter) and in English (so as to be accessible to the auditors). PF => If all the audit records are accessible to English-only readers, then, for requirement "Auditability of Records for English Readers", the system passes, otherwise it fails. Test Method: Operational Usability for Poll Workers (PWU) Covers requirements: 3.2.8-A Clarity of System Messages for Poll Workers 3.2.8.1-A Ease of Normal Operation 3.2.8.1-C Documentation usability 3.2.8.1-C.1 Poll Workers as target audience 3.2.8.1-C.2 Usability at the polling place 3.2.8.1-C.3 Enabling verification of correct operation PWU Overview 1. General Approach This section describes the assessment of Operational Usability for Poll Workers for the VVSG tests. The Style Guide for Voting System Documentation as well as the Guidelines for Writing Clear Instructions and Messages for Voters and Poll Workers provide a basis for determining whether a given system's instructions are at a professionally-recognized level of usability. This protocol determines whether or not a voting system meets the referenced set of poll worker usability requirements. It combines expert review and inspection with the results of a "mock election" where participant poll workers follow instructions from the documentation to perform tasks. All information necessary to carry out the protocol is described in a step-by-step fashion. Following the instructions given is crucial to the validity of the test. Participant teams are recruited from groups of "experienced" and "inexperienced" poll workers, with each pair including one member from each group. During the test, participant teams are required to perform typical poll worker tasks, including: opening the polls, conducting polling, closing the polls. The expert review of system documentation determines whether or not documentation of these tasks sufficiently supports their performance by poll workers. Likewise, the expert inspection of the voting system determines whether or not the system sufficiently implements these tasks to support their performance by poll workers. The protocol is performed by two testers (a test administrator/expert and an assistant/data logger). After running portions of the protocol, where directed, the test administrator interprets the test results using the pass/fail criteria specified in this protocol. When done, instructions are given to report the overall results obtained. All testing with participants must be conducted in full accordance with human subject protection protocols. PWU Overview 2. Acronyms The following acronyms are used throughout the protocol: PWU Operational Usability for Poll Workers Protocol - test method for poll worker usability requirements. IRB Institutional Review Board VSTL Voting System Test Laboratory VSUT Voting System under test - the system for which conformance to the VVSG is being evaluated. VVSG Voluntary Voting System Guidelines - the set of requirements against which one tests conformance by a VSUT. PWU Overview 3. Role of Manufacturer The system manufacturer may or may not observe the test, according to the practices of the test lab. However, no manufacturer representative may have any contact with participants before, during, or after the test. PWU Overview 4. Protocol Steps Here are the major steps of the protocol: Perform expert inspection Recruit and schedule participants Set up environment Set up voting system Prepare participants Conduct the voting Data collection Analyze data Report system results PWU Step 1. Perform expert inspection Expert review of the documentation is performed to determine whether or not the documentation is fit to perform the test. As part of this process, the expert determines to what degree the documentation has or has not been constructed based on best documentation practices,using the guides mentioned above as the basis for the determination. If the documentation is unfit to perform the test, this is expressed explicitly in the given pass/fail criteria. Also, expert inspection is performed wherein the expert determines whether or not the system is fit to perform the test (including all the requisite features, materials, etc). Then, the expert performs a "dry run" of the test, enacting the role of a poll worker in the "mock election". If the expert determines the system unfit to perform the test, this is also expressed in the respective pass/fail criteria. PWU Step 1.1 Expert review of documentation Two experts in usability shall play the role of poll workers who must operate the voting system, based on the system documentation. Start with the system as intended to be delivered to the polling place. You may assume that ballot definitions have already been loaded, but the system may be packaged as if delivered from a central warehouse. Accompanying system documentation is also delivered as it usually would be. System documentation may include instructions for complex operations and troubleshooting, but this will not be used in the test. The documentation may consist of paper manuals, quick setup guides, and electronic media, such as DVDs. The overall documentation strategy is up to the manufacturer. The experts must first find the instructions for normal setup, operation and maintenance, and shutdown. It should be reasonably easy to isolate this "poll worker" material from the documentation of more complex procedures (e.g. ballot definition, equipment repair, or diagnostic testing). The poll worker documentation is to be reviewed for clarity, organization, appropriate level of writing, internal consistency, completeness, and other attributes of good documentation usability. F => If the poll worker documentation is not written at a level readily understandable by non-experts, then, for requirement "Poll Workers as Target Audience", the system fails. F => If the poll worker documentation is not organized for easy use in a polling place situation, then, for requirement "Usability at the Polling Place", the system fails. F => If the poll worker documentation does not clearly explain how to verify that the system is in a correct state for setup, operation, and shutdown, then, for requirement "Enabling Verification of Correct Operation", the system fails. F => If the expert review of the poll worker documentation reveals any other serious problems for poll worker usability, then, for requirement "Documentation Usability", the system fails. PWU Step 1.2 Expert inspection Based on the documentation, the experts shall go through an entire setup / operation (including the casting of at least three full ballots) / shutdown cycle. The purpose is to review 1) the accuracy of the documentation, and 2) the degree of difficulty of the procedures themselves. It is recognized that the setup, operation, and shutdown procedures involve a certain inherent degree of complexity; the expert review is intended to detect situations presenting special difficulties (physical or cognitive) to poll workers. F => If the documentation contains significant inaccuracies or omissions with respect to the actual procedures, then, for requirement "Documentation Usability", the system fails. F => If the procedures are judged to be excessively difficult, complex, or error-prone, then, for requirement "Ease of Normal Operation", the system fails. During this test, the experts shall review all messages and warnings generated by the system. Each message shall be reviewed for: Accuracy - Does the message accurately reflect the state of the system? Completeness - Does the message tell the poll worker what steps need to be taken? Clarity - Does the message adhere to the guidance of section 3.2.4-C XREF Plain Language? F => If all the messages encountered are not deemed clear and usable, then, for requirement "Clarity of System Messages for Poll Workers", the system fails. PWU Step 2. Recruit and schedule participants If the results of the expert inspection failed any of the pass/fail criteria in Step 1, then the system is unfit to perform the test. In such a case, the test must be stopped and a report generated. Otherwise, continue. In all activities, comply with state and federal human subject protection laws (e.g., via use of IRB services, etc). PWU Step 2.1 Screening Recruit 8 two-person teams (e.g., 16 participants). Each team should consist of one experienced and one inexperienced member as defined in the table below. Over-recruit by 2 "backup" participants for each group, bringing the overall total to 10 participant teams (20 participants). Use the screening questionnaire to ensure that each meets the following target demographics. PWU Step 2.2 Demographics The participant sample of poll workers is a convenience sample. Participant characteristics should be similar to the area's poll worker population. The required characteristics and desired number of participants from each include: Required characteristicsDesired number of participants Poll worker experienceEnsure each poll worker team has 1 experienced and 1 inexperienced poll worker. 5 elections within 3 years (experienced)9-11 1-2 elections within the last 12 months (inexperienced)3-5 Attended poll worker training but may have not worked an election yet (inexperienced)5-7 AgeTry to obtain a fairly even split between age groups. 18-407-9 41+11-13 GenderTry to obtain a fairly even distribution between gender groups. Female8-12 Male8-12 In addition, the participant population should be limited to individuals who: are US citizens eligible to vote (i.e., are at least 18 years of age) are literate in English have no significant connection to any manufacturer of voting systems - e.g. no close relative as an employee or owner PWU Step 2.3 Scheduling and Logistics After screening, provide participants with details regarding the test time and location. Be sure that all tester contact information and directions to the facility are provided. Obtain their contact information and explain that they will be compensated XX amount of dollars (depending on the geographic location; e.g. for metropolitan DC area we recommended an average of $50/hr). Backup participants who stand by should also be compensated. Schedule them for a 2 hour session. Participants should be scheduled for a staggered arrival at the testing site, so as to avoid excessive waiting. Given the above session time, it would be reasonable to separate participant arrivals by about 30 minutes, but the optimum interval depends strongly on the system tested. PWU Step 3. Set up environment The goal, as far as possible, is to simulate a high quality polling place. Thus, any errors detected will not be traceable to extraneous environmental factors. There must be sufficient room in which to carry out the mock setup and voting, using a single voting station. The voting area should have the following characteristics: Size: minimum 12' by 15' by 8' high. Ambient lighting should be in the range of 400-600 lx. If possible, use indirect lighting rather than overhead fixtures or direct sunlight so as to reduce glare. Ambient noise levels should be below 40dB Ventilation should be such as to avoid either a "stuffy" or "drafty" feeling. Temperature should be between 68 and 76 Fahrenheit Relative Humidity should be between 20% and 60% See this OSHA guideline for more detailed recommendations. This University of Wisconsin webpage is also useful. PWU Step 4. Set up voting system The protocol requires the manufacturer to load the voting system with a ballot based on the NIST standard ballot specification. The manufacturer is responsible for the actual ballot design (fonts, layout, etc). Once this ballot has been loaded on the VSUT, the test lab must leave each system in its "arrival state" from the manufacturer, the state in which a poll worker will first receive it. Upon receipt of the system, use the following checklist as well as the associated documentation to ensure all equipment, accessories, and necessary materials (such as ballots) have been provided. Prior to testing with participants, ensure that testing materials have been appropriately prepared. Print out the following: Test administrator steps - receipt and preparation of participants - 1 copy per tester Test administrator scripts - running the test, ending the test - 1 copy per tester Participant task sheets - task 1, task 2, task 3 - 1 copy of each System documentation - all manuals or documentation provided - 1 copy of each (if in printable form) Data recording and evaluation sheets - session evaluation checklist - 1 copy per testing session Note: Be sure to print each participant task scenario separately in large (at least 14 pt sans serif) type on a 4 by 6-inch piece of paper. Also, for optical scan systems, prepare 3 paper ballots: 1 over-voted, 1 under-voted, and 1 fully voted. Finally, ensure that - before each participant team begins the test - the system has been returned to a fully "packed" state (as it would typically arrive from a manufacturer). For the consistency of the test, the tester must always return the system to this state after each test (and before each next participant team). Likewise, it is important that the process for achieving this "packed" state is consistent in its method of shutdown (from a "polls closed" state) each time this process is performed. This includes the re-packing of accessories, etc. The performance of this process establishes a consistent, fixed starting state from which all participant teams begin the test. PWU Step 5. Prepare participants The test staff are responsible for preparing the participants for the test procedure. On testing days, after participants arrive and check-in, participants meet the test administrator and are asked to sign a consent form. They are then given an overview of the test. Follow the test administrator script for preparing participants and use the provided consent form. PWU Step 6. Conduct the voting PWU Step 6.1 Testing Participants are escorted to the system and given instructions. Their assigned task is to provide poll worker support to an entire "mock election" by performing a series of typical poll worker tasks whose instructions are written in the documentation. The tasks include: opening the polls, conducting polling, and closing the polls. For each task, they must locate and follow instructions from the documentation. Test staff will observe their activities while recording data and observational evidence. Once the session ends - either due to successful completion of all tasks or to non-completion of any task - the participant team is given a debriefing survey and is compensated for their participation. During testing, test staff minimize their interaction with participants to maintain the objectivity of the test. Only specific interactions are allowed such as: those required to get them started or to deliver required instructions (as is done when using a provided script). During task performance, starting and stopping times are recorded as well as whether or not the participant team could complete a given task using only the documentation provided. A time limit of 60 minutes is placed on the first task. If a participant team cannot complete a given task within the given constraints (e.g., time limit on first task, "opening polls"), their session will be considered terminated, at which time they will be thanked and compensated. If a given participant team does not arrive as planned or does not follow instructions, their session may be terminated. In such cases, contact and test the respective backup participant team as a replacement. The test administrator should use the provided scripts to run and end the testing session. Review all scripts and forms provided prior to participant sessions so you are familiar with them. Test staff must use the provided session evaluation checklist to record data and observational evidence during all tasks and also as a guide when making observations. After each test (and before beginning this test protocol with the next participant), follow the description in Step 4 to return the system to a "packed" starting state. PWU Step 6.2 Successful Task Completion Use the session evaluation checklist to decide whether or not each participant team has successfully completed a task. How to use the session evaluation checklist: Before the test begins, review the checklist. Be sure you understand the Evidence and Criteria on the left and the task end states across the bottom (the shaded area on the form). As participants work on tasks, keep the evidence and criteria in mind. When participants have completed a task or were unable to complete a task and have stopped, mark the Yes checkboxes for the task if the criteria have been met; mark the No checkboxes if the criteria have not been met. Make a final determination for the task. Does the voting system support poll workers in this task or not? If participants were unable to complete the task according to the end state for the task, the voting system fails and the test ends. Do not go on to the next task. If participants were able to complete the task according to the end state, participants go on to the next task. PWU Step 6.3 Collecting timing data during testing The data logger is responsible for timing the first task. The first task is timed because a fixed amount of time (on average, about an hour) is usually scheduled for opening the polls in a real election. Timing begins when the participants finish reading the task scenario and start performing the task. Once the participant team has completed the poll worker task and have indicated this verbally, the data logger will record the elapsed time on the session evaluation checklist. PWU Step 6.4 Sufficient number of sessions The test should be performed until 8 valid sessions (each having a participant team) have been performed. Note that a valid session is one in which participants follow the instructions given and testers adhere to the test protocol as specified. If, however, the entire pool of backup participants has been applied and the desired total valid sessions has not yet been reached, the protocol must be abandoned as invalid and rerun at a later time with a new set of participants. PWU Step 6.5 Compensation paid to Participants As directed by the test administrator scripts, provide the participants with their respective compensation ($50.00/hr) and thank them for their time. PWU Step 7. Data collection The essential data recorded for each session includes: Per-task: completed? without assistance? within max allotted time (for first task)? Overall: completed all tasks? All records pertaining to the test data (whether created by the voting system or by the test facilitator) should be stored safely and privately for future reference. The purpose is twofold: first to protect participant privacy, and second to allow any questions about the test results to be resolved based on direct evidence. PWU Step 8. Analyze data A pass/fail determination is made for the VSUT, based upon the number of participant teams who complete each of the three tasks (opening the polls, conducting polling, and closing the polls). Throughout, successful task completion means that the team got the VSUT into the correct state within the time allotted for that task, and without external assistance (such as coaching from the test administrator). FP => If there was any task that a majority of the participant teams did not complete, because system messages were difficult to understand or follow, then, for requirement "Clarity of system messages for poll workers", the system fails, otherwise it passes. FP => If there was any task that a majority of the participant teams did not complete, because the documentation was too technically complex, then, for requirement "Poll workers as target audience", the system fails, otherwise it passes. FP => If there was any task that a majority of the participant teams did not complete, because the documentation is not presented in a format suitable for the polling place, then, for requirement "Usability at the polling place", the system fails, otherwise it passes. It is important that the team not only complete the task at hand, but recognize that it has done so. PF => If, for each of the tasks, at least half the participant teams completed the task and then confirmed that completion to the tester, then, for requirement "Enabling Verification of Correct Operation", the system passes, otherwise it fails. As the tester observes the teams completing each task, it should be evident that the documentation is actually helping them to do so. NOTE: The protocol includes the administrator instructing the participants to use the documentation. FP => if there was any task that a majority of the participant teams did not complete, because the system (instructions and documentation) failed to provide clear and sufficient guidance, then, for requirement "Documentation usability", the system fails, otherwise it passes. FP => If there was any task that a majority of the participant teams did not complete, because the overall system operation is excessively difficult, complex, or error-prone, then, for requirement "Ease of Normal Operation", the system fails, otherwise it passes. PWU Step 9. Report system results Report items should include: Identification (make and model) of the VSUT Results of each pass/fail assertion evaluated (both for the expert inspection/review as well as the test) The report should be prepared in the Common Industry Format (CIF). Test Method: End-to-end Accessibility (E2E-Acc) Covers requirements: 3.3.1-A Accessibility throughout the Voting Session 3.3.1-A.1 Documentation of Accessibility Procedures 3.3.1-C No Dependence on Personal Assistive Technology E2E-Acc Overview 1. General Approach This section describes the full End-to-end Accessibility Test Protocol (E2E-Acc). It tests whether a voting system ensures accessibility throughout the voting session, determining whether or not a voting system meets the referenced set of accessibility requirements. It combines expert inspection with the results of a "mock election" using participants with disabilities. All information necessary to carry out the protocol is described in a step-by-step fashion. Following the instructions given is critical to the validity of the test. Safety and accessibility are addressed in every phrase. During the test, participants are required to perform typical voter tasks, including: approach the voting system, activate the voting session, vote contests review (and verify) voted contests, cast ballot, exit the voting system. The expert inspection of system documentation determines whether or not documentation of these tasks sufficiently supports their performance by participants with disabilities. Likewise, the expert inspection of the voting system determines whether or not the system sufficiently implements these tasks to support their performance by participants with disabilities. The protocol is performed by two testers (a test administrator/expert and an assistant/data logger). After running each step, where directed, the test administrator interprets the test results using the pass/fail criteria specified in this protocol. When done, instructions are given to report the overall results obtained. All testing with participants must be conducted in full accordance with human subject protection protocols. The testers must have expertise in accessibility for people with disabilities and experience in human subject protection. While performing this protocol, the test administrator must ensure accessibility and safety throughout the entire test including participant arrival, testing process, and departure. E2E-Acc Overview 1. Acronyms The following acronyms are used throughout the protocol: E2E-Acc End-to-end Accessibility Protocol - test method for end-to-end accessibility requirements. IRB Institutional Review Board VSTL Voting System Test Laboratory VSUT Voting System Under Test - the system for which conformance to the VVSG is being evaluated. VVSG Voluntary Voting System Guidelines - the set of requirements against which one tests conformance by a VSUT. E2E-Acc Overview 2. Role of Manufacturer The system manufacturer may or may not observe the test, according to the practices of the test lab. However, no manufacturer representative may have any contact with participants before, during, or after the test. E2E-Acc Overview 3. Protocol Steps Here are the major steps of the End-to-end Accessibility Protocol: Perform expert inspection Recruit and schedule participants Set up environment Set up voting system Prepare participants Conduct the voting Debrief participants Data collection Analyze data Report system results E2E-Acc Step 1. Perform expert inspection Expert inspection is performed on the VSUT. The expert determines whether or not the system is fit to perform the full usability test (having all the requisite features, materials, etc). Then, the expert performs a "dry run" of the test, enacting the role of a voter in the "mock election". If the expert determines the system unfit to perform the test, this is expressed explicitly in the given pass/fail criteria. If the system is fit to perform the test, proceed to participant recruitment and testing. E2E-Acc Step 1.1 Expert review of documentation The testers shall review the documentation provided by the manufacturer and confirm that it describes voting procedures covering at least those voters who have vision (low vision or blind), dexterity, or mobility disabilities. It must be understandable by poll workers, who may have to explain to voters how to use the system. The documentation should explain any special factors for operating the system in a polling place environment (e.g. placement of system, necessary clearances, etc.). Session startup procedures (such as plugging in a personal headphone or initiating use of non-manual input) should be described. In particular, the documentation should specify whether the voter or poll worker is expected to perform a given startup procedure. F => If the documentation does not clearly and accurately describe voting procedures for accessibility, then, for requirement "Documentation of Accessibility Procedures", the system fails. E2E-Acc Step 1.2 Expert inspection The testers shall then attempt to follow the documented procedures throughout a voting session in as much detail as is necessary to evaluate their usability for voters with disabilities. The session must be conducted three times, using: the conventional visual-tactile interface the audio interface the synchronized audio/visual interface with wheelchair access and the non-manual controls provided for voters with dexterity disabilities. The voting session includes not only making ballot choices, but also session startup, ballot initiation, navigation, review, verification, and casting. F => If any of the procedures is judged to present significant difficulty for disabled voters, then, for requirement "Accessibility throughout the Voting Session", the system fails. F => If any step of a procedure requires the voter's use of personal assistive technology, then, for requirement "No Dependence on Personal Assistive Technology", the system fails. E2E-Acc Step 2. Recruit and schedule participants If the results of the expert inspection failed any of the pass/fail criteria in Step 1, then the system is unfit to perform the test. In such a case, the test must be stopped and a report generated. Otherwise, continue. In all activities, comply with state and federal human subject protection laws (e.g., via use of IRB services, etc). E2E-Acc Step 2.1 Screening Recruit 16 participants with disabilities, 4 from each of the specific disability groups: blindness, low vision, dexterity, mobility. Over-recruit by 2 backup participants for each group, bringing the overall total to 24. Use the screening questionnaire to ensure each meets the following target demographics. E2E-Acc Step 2.2 Demographics The participant population is limited to individuals who: are US citizens eligible to vote (i.e., are at least 18 years of age) are literate in English have one-or-more of the following disabilities (blindness, low vision, dexterity, mobility) have no significant connection to any manufacturer of voting systems - e.g. no close relative as an employee or owner E2E-Acc Step 2.3 Disabilities For each disability group, the basic voter characteristics in terms of their interaction with the voting system are described followed by specific physical characteristics of the impairment. Blindness A blind voter is one who will require a non-visual interface (either audio or Braille) to a voting system. An audio non-visual interface would utilize headphones to ensure privacy. For the basis of these tests, the definition of Legal Blindness is used. The Snellen metric provides an accepted means of stating visual acuity, and is commonly referenced when defining level of visual impairment. Legal Blindness refers to an individual with at best corrected central vision of 20/200 in at least one eye. Legal blindness can include individuals who have a limited central visual field (typically less than 20), even with visual acuity of better than 20/200. Low Vision A low vision voter will require an enhanced visual interface that supports a combination of magnification, font enlargement, and color/contrast controls. Low vision is defined as a visual acuity between 20/70 and 20/200. For the basis of these tests, a low vision voter is defined as someone with corrected visual acuity no better than 20/80, as this value is specified by several bodies as defining the baseline of moderate low vision. Dexterity A voter with a dexterity impairment may not be able to use, or have difficulty in using, the standard control mechanisms on the voting system. They may require control mechanisms that are large and that require low activation force, or require a specialized means of controlling the voting system, e.g. by activating a paddle switch. A voter with a dexterity impairment may also have limitations in mobility (see below). For the basis of these tests, a voter will have limitations in the use of their hands, with limitations in strength and reach. Mobility For the basis of these tests, a mobility impaired voter utilizes a non-motorized wheel chair and may have impaired dexterity. E2E-Acc Step 2.4 Scheduling and Logistics After screening, provide participants with details regarding the test time and location. Be sure that all the tester contact information and directions to the facility are available and provided in a format that is accessible to the participants. Obtain their contact information and inform them that they will be compensated XX amount of dollars (depending on geographic location, metropolitan DC area recommended $100). Backup participants who stand by should also be compensated. Schedule them for a 90 minute session. Participants should be scheduled for a staggered arrival at the testing site, so as to avoid excessive waiting. Given the above average session time, it would be reasonable to separate participant arrivals by about 30 minutes, but the optimum interval depends strongly on the system tested. E2E-Acc Step 3. Set up environment The test facility should be accessible and must comply with the American's with Disabilities Act (ADA) physical accessibility requirements. Please see the current regulations for ADA-compliant facilities. All facility staff involved in performing this test protocol are responsible for monitoring the safety of the participants, ensuring that the test facility is kept free of obstacles. The staff are also responsible for informing the participant regarding the location of exits and evacuation procedures for people with disabilities. In multi-story facilities, where the test lab may be located above or below ground-level exits, evacuation chairs (used to enable evacuation of mobility-impaired individuals on stairs) must be available. In addition to accessibility and safety considerations, the goal, as far as possible, is to simulate a high quality polling place. Thus, any errors detected will not be traceable to extraneous environmental factors. There must be sufficient room in which to carry out the mock voting, using a single voting system. The voting area should have the following characteristics: Size: minimum 12' by 15' by 8' high. Ambient lighting should be in the range of 400-600 lx. If possible, use indirect lighting rather than overhead fixtures or direct sunlight so as to reduce glare. Ambient noise levels should be below 40dB Ventilation should be such as to avoid either a "stuffy" or "drafty" feeling. Temperature should be between 68 and 76 Fahrenheit Relative Humidity should be between 20% and 60% See this OSHA guideline for more detailed recommendations. This University of Wisconsin webpage is also useful. E2E-Acc Step 4. Set up voting system E2E-Acc Step 4.1 Voting system The protocol requires the manufacturer to set up the voting system with a ballot based on the NIST standard ballot specification. The manufacturer is responsible for the actual ballot design (fonts, layout, etc). Once this ballot has been loaded on the VSUT, the test lab must set the voting system up as described in its documentation and prepare it to receive votes. E2E-Acc Step 4.2 Testing room In the testing room, tables must be at an accessible height, and chairs must be provided for the participants to be seated in, both while waiting and during the actual testing. A movable tray table should be available to allow participants to place instructions or assistive devices at a convenient height and reach. Setup system and accessories Setup the voting system using all associated equipment and documentation provided so that the system is ready to be used for voting. Ensure all associated access devices (such as headphones, etc) are available. Setup observation scenarios Determine the necessary observational scenario required for this machine. Ensure that full observation is possible (to ensure ability to record all experimental data). Perform disability-specific preparations Ensure all accessibility scenarios are accounted for regarding the disability groups being tested on the given machine. This includes making a side table available for instructions as well as other measures to ensure a participant can access the machine as much as is possible in order to perform the test. Sanitization Ensure the whole setup, all access devices, and anything with which a participant might have physical contact, are properly sanitized (or replaced) prior to each use. This is to protect the safety and health of each participant. For common instruction formats, follow the provided guidance for accessibility of instructions. E2E-Acc Step 5. Prepare participants E2E-Acc Step 5.1 Participant arrival Facilitate participant arrival on testing days by coordinating, arranging, and possibly paying for travel if they cannot provide their own transportation. Escort participants to the lab, where they are checked in. E2E-Acc Step 5.2 Participant preparations After check-in, participants are presented with a consent form (in accessible format). After signing the consent form they are given an brief overview of the test. The detailed steps include: Greet incoming participants and verify that they are here for the appropriate purpose. Administer a consent and release form, as appropriate. Here is an example of the form NIST used during development of the test, but this should be customized to suit the test lab. Usually, you would witness the participant's signature and then sign the form yourself as a witness. Deliver general instructions to the participant (i.e., according to their disabilities). Have him or her review the instructions. Escort the participant inside the testing room to the mock registration desk. Deliver voting instructions to the participant (i.e., according to their disabilities). Have him or her review the instructions. Do not coach the participant on strategies for voting or on how to use the voting system. The goal is to minimize, if not eliminate entirely, any "facilitator effect". Finally, escort the participant to the system to begin their tasks. E2E-Acc Step 6. Conduct the voting E2E-Acc Step 6.1 Testing Participants are escorted to the machine and given instructions (in accessible format). Their assigned task is to vote, based on a provided voting script, using a designated voting machine. Test staff will observe their interaction recording data and observational evidence. Once the participant's session ends - either due to successful completion of the task or to non-completion of the task - the participant is given a debriefing survey and is compensated for their participation. During testing, test staff minimize their interaction with participants to maintain the objectivity of the test. Only specific interactions are allowed such as: those required to get them started or to deliver required instructions to them. Accommodations are made for each participant according to their disabilities. Each machine provides a set of access devices designed to facilitate use by voters with disabilities. Participants from a given group are asked to vote using a specific device. Instructions to blind participants need to be delivered in audio/verbal format. They will execute each instruction and then ask for the next instruction. For participants with mobility or dexterity disabilities place instructions on a table/cart within reach next to the voting machine. Low vision participants will be provided with instructions written in large formats. Participant tasks include approaching the machine, activating the voting session, voting contests, reviewing and verifying those votes, casting the ballot, and exiting the voting system. For each of these tasks (as well as the overall encompassing task of voting) starting and stopping times are recorded as well as whether or not the participant could complete a given task without assistance. Each task is assigned an allotted time. NIST recommends the following default maximum allotted task times, itemized by task: Approach voting system - 10 minutes Activate voting system - 10 minutes Vote (make selections according to script) - 10 minutes Review (and verify) ballot - 10 minutes Cast vote - 10 minutes Exit voting system - 10 minutes E2E-Acc Step 6.2 Tester Roles During the protocol, testers enact different roles. Those roles include: Facilitator A designated member of the project team, with training in human subject protection and experience in communicating with people with disabilities assumes the role of facilitator for the test session. The facilitator is responsible for welcoming the study participant on the day of testing, administering the informed consent form, and delivery of the general instructions. The facilitator ushers the participant into the testing room and to the Registration Desk, where task specific instructions are delivered. The instructions are offered in printed form (standard or large print) or presented in spoken form to the participants, based upon their disability. All interactions with the participant are scripted for consistency in delivery. Once the actual test begins, the facilitator assumes the role of observer for purposes of data collection and has no further interaction with the participant until the end of the data collection session. Mock Poll Worker A second member of the project team, with training in human subject protection and experience in communicating with people with disabilities assumes the role of mock poll worker. The mock poll worker is familiar with the manufacturer supplied documentation and instructions related to the use of the VSUT and introduces the participant to the VSUT and voter access card (or other such mechanisms related to the VSUT). All interactions with the participant are scripted for consistency in delivery. The mock poll worker assists the participant in the initiation of the voting session only to the extent that such assistance is specifically defined in the vendor supplied documentation and is scripted for consistency in delivery. No further assistance from the mock poll worker is allowed, unless specifically defined in the test protocol. In an actual election and polling station, a voter who encounters difficulties may be able to request assistance from a poll worker. Because such assistance is inconsistent at best and not in keeping with the goal of voting with independence for voters with disabilities, no assistance is allowed in the VSUT test session and any such assistance may invalidate the result. The specific purpose of this test is to verify the direct accessibility of the VSUT, in a consistent and repeatable manner. E2E-Acc Step 6.3 Successful Task Completion If a participant cannot complete a task within the allotted times, their session will be considered terminated at which time they will be thanked, compensated, and escorted to their transportation. If the participant cannot successfully complete a task, this should be noted. Non-completion of a task may result from any of these possible conditions: some limitation in their interaction with the VSUT, allotted task time is insufficient, needed personal assistive technology, some combination of 1, 2, or 3. Examples of task non-completion include not being able to activate the machine, not being able to make ballot selections, not being able to review or verify selections made, and/or not being able to cast one's ballot. In some cases these may take the form of a participant having the intention to carry out the action but being prevented from doing so by some limitation in their interaction with the system. In other cases, a participant may have actually completed a given task but may be unaware that they have done so because it is not clear to them that the system has reached a given state. In some cases, even if a test administrator has determined that a participant is unable to successfully complete a task, the administrator should have the participant continue to attempt subsequent tasks, for the purposes of complete data collection and reporting. In such a situation, the test administrator completes the task for the participant, carefully noting all steps taken (so that they can be appropriately accounted for when analyzing data from the test). Steps that are performed by the test administrator must be excluded from the data analysis. For example, any ballot selections made (such as completion of a write-in) by the test administrator to complete the task of making ballot selections, must be removed from consideration. Only data generated by the user should be included in data analysis. If a given participant does not arrive as planned or does not follow instructions, their session may be terminated. In such cases, contact and test the respective backup participant as a replacement. E2E-Acc Step 6.4 Collecting timing data during testing Two types of timing data are collected during testing. The first is session timing data. It measures the elapsed time for the entire session, starting when a participant begins interacting with the system and ending when they cease interacting with the system. The second is task-specific timing data. It measures the elapsed time for each task a participant attempts to complete, beginning when they start attempting to perform a given task and ending when either a) the maximum allotted task time has been reached or b) when they indicate they have completed the given task. In all cases, data loggers must exercise care in recording accurate times. Accurate timing devices (to the nearest second) should be used for measuring timing data. The data logger is responsible for timing. If the facilitator is required to initiate the voting session as a poll worker, timing will begin when the facilitator has completed initiation. If the participant starts the voting session, timing will begin when the participant reaches the system. Note that there may be a delay before the participant actually commences the first step (i.e., enters the activation card, receives the paper ballot, etc.); this delay is considered to be part of the session to be timed, since the time taken to understand how to begin is significant. Task timing Once the participant has completed a given task, the data logger will record the elapsed time. What constitutes completion of the a task depends on the type of system. In all cases, per the protocol, a participant is instructed to indicate when they have completed a given task. The completion of the last task will constitute completion of the entire session. Thus, completion of the final task may also be taken as the end time for the overall session as well. Overall session timing The overall session time will be considered to be the full time elapsed that corresponds to the starting time for the first task and the ending time for the last task. E2E-Acc Step 6.5 Sufficient number of sessions The test should be performed until 16 valid sessions have been performed (4 valid trials of the test per disability). If, however, the entire pool of backup participants has been applied and the desired total valid sessions has not yet been reached, the protocol must be abandoned as invalid and rerun at a later time with a new set of participants. E2E-Acc Step 7. Debrief participants Once the participant has completed the voting session, a facilitator administers the post-test questionnaire. The questions ask the participant about their own demographic data, voting experience, as well as their experience during the given testing session. Of crucial importance to the validity of the test is a single question asking them to indicate whether or not they tried to follow the instructions to perform tasks and to vote as instructed. If a participant indicates that they did not try to follow instructions, then the test administrator must consider their data invalid (details TBD) and must replace them with a backup participant (up to 2 backup participants per disability will be retained on a "stand by" basis). If, after including 2 additional participants for any disability group, if the test administrators cannot obtain 4 "valid" trials of the test per disability, the test must be deemed "invalid" and must be terminated. When a sufficient number of participants have answered "Yes" (that they did try to follow instructions), then proceed with the data collection step below. Finally, provide the participants with their respective compensation ($50.00/hr) and thank them for their time. E2E-Acc Step 8. Data collection The essential data recorded for each session includes: Per-task: completed? without personal assistive technology? within max allotted time? Overall: completed all tasks? All records pertaining to the test data (whether created by the voting system or by the test facilitator) should be stored safely and privately for future reference. The purpose is twofold: first to protect participant privacy, and second to allow any questions about the test results to be resolved based on direct evidence. E2E-Acc Step 9. Analyze data A pass/fail determination is made for the VSUT, based upon expert inspection and the results of the controlled experiment that include the number of participants in each disability group who complete each of the tasks (activation, candidate selection, ballot review, and ballot casting). Throughout, successful task completion means that the participant performed the indicated operation within the time allotted for that task, and without external assistance (such as coaching from another person). If a system passes both the expert inspection of the voting system as well as the controlled experiment with participants, then (and only then) can a system pass for this requirement. FP => If there was any task that a majority of any disability group did not complete, then, for requirement "Accessibility Throughout the Voting Session", the system fails, otherwise it passes. A system is accessible if participants can fill out their ballot correctly, i.e. according to the instructions handed out by the facilitator. The test administrator must inspect each ballot as cast and determine whether or not it was filled out exactly as instructed, or if there were any errors. FP => If a majority of any disability group made at least one incorrect choice on their ballots, then, for requirement "Accessibility Throughout the Voting Session", the system fails, otherwise it passes. The system must provide all the assistive technology needed for completing the tasks. However, participants are permitted to use their own personal devices if they so choose. As you observe the participants, ensure that the system offers all the necessary equipment. FP => If any task requires any participant's use of personal assistive technology (beyond that provided by the system), then, for requirement "No Dependence on Personal Assistive Technology", the system fails, otherwise it passes. E2E-Acc Step 10. Report system results Report items should include: Identification (make and model) of the VSUT Results of each pass/fail assertion evaluated (both for the expert inspection/review as well as the test) The report should be prepared in the Common Industry Format (CIF). Test Method: Accessible Ballot Verification and Submission Covers requirements: 3.3.1-E Accessibility of Paper-based Vote Verification 3.3.1-E.1 Audio Readback for paper-based Vote Verification 3.3.3-E Ballot Submission and Vote Verification Two testers with accessibility expertise shall proceed through an entire voting session using the default ballot choices. P => If the system does not use a paper-based record for vote verification, then, for requirement "Accessibility of Paper-based Vote Verification", the system passes. P => If the system does not use a paper-based record for vote verification, then, for requirement "Audio Readback for paper-based Vote Verification", the system passes. If the system is one that generates a paper record (or some other durable, human-readable record) for ballot verification, then the testers shall verify that a mechanism is provided that can read that record and generate an audio representation of its contents. PF => If the system provides audio readback for paper verification records, then, for requirement "Audio Readback for paper-based Vote Verification", the system passes, otherwise it fails. Furthermore, the paper verification record must be accessible to voters with dexterity, mobility, and other disabilities. For example, the record must be positioned so as to be easily visible by a voter in a wheelchair. PF => If the system's paper verification records are fully accessible, then, for requirement "Accessibility of Paper-based Vote Verification", the system passes, otherwise it fails. If the voting station supports ballot submission for sighted voters, the testers shall proceed through the process of ballot submission, using the features provided for blind voters and shall verify that these features constitute a viable mechanism for such voters. P => If the system does not support ballot submission for sighted voters, then, for requirement "Ballot Submission and Vote Verification", the system passes. PF => If the system allows blind voters to submit the ballot without significant difficulty, then, for requirement "Ballot Submission and Vote Verification", the system passes, otherwise it fails. Test Method: Partial Vision Covers requirements: 3.3.2-B Adjustable Saturation for Color Displays 3.3.2-C Distinctive Buttons and Controls The tester shall directly examine any hardware buttons and controls intended for use by the voter and verify that no two have an identical shape, nor do any two have identical colors. This requirement does not apply to sizeable groups of keys, such as a conventional 4x3 telephone keypad or a full alphabetic keyboard. F => If any pair of hardware buttons and controls have the same color or same shape, then, for requirement "Distinctive Buttons and Controls", the system fails. Throughout the following test method, the tester shall make a similar check for the distinctiveness of any buttons and controls that are presented on-screen at the same time. The tester shall select a low saturation color presentation and then proceed through the voting session. The tester shall note the displayed level of saturation. After voting for US Representative (Contest #3), the tester shall then select the highly saturated color option, and vote through contest #6, verifying that the new color is distinctively more saturated than the original. F => If a higher saturation color cannot be obtained, then, for requirement "Adjustable Saturation for Color Displays", the system fails. The tester shall navigate back to the first three contests and verify that they are being displayed with high saturation and that the original ballot choices were preserved. F => If all previous contests are not now displayed in high saturation, then, for requirement "Adjustable Saturation for Color Displays", the system fails. F => If all original ballot choices are not preserved, then, for requirement "Adjustable Saturation for Color Displays", the system fails. After voting for Registrar of Deeds (Contest #6), the tester shall then re-select a low saturation color and vote through contest #9, and repeat the above process, verifying that the presentation is now of low saturation and that earlier ballot choices have been preserved. F => If all previous contests are not now displayed in low saturation, then, for requirement "Adjustable Saturation for Color Displays", the system fails. F => If all original ballot choices are not preserved, then, for requirement "Adjustable Saturation for Color Displays", the system fails. F => If throughout the session, any two on-screen buttons and controls have the same shape or same color, then, for requirement "Distinctive Buttons and Controls", the system fails. Test Method: Audio-Tactile Interface Covers requirements: 3.3.3-B Audio-Tactile Interface 3.3.3-B.1 Equivalent Functionality of ATI 3.3.3-B.2 ATI Supports Repetition 3.3.3-B.3 ATI Supports Pause and Resume 3.3.3-B.4 ATI Supports Transition to Next or Previous Contest 3.3.3-B.5 ATI Can Skip Referendum Wording This test requires two systems. The tester shall proceed through an entire voting session, using the conventional visual interface on system A and the audio-tactile interface (ATI) on system B in parallel. Use the default ballot choices, except as noted below. Check for the presence of full instructions and feedback in the ATI, including at least the items described in the discussion of requirement 3.3.3-B XREF "Audio-Tactile Interface" and section 3.2.4-A XREF "Completeness of Instructions". F => If the ATI does not provide full instructions and feedback as described, then, for requirement "Audio-Tactile Interface", the system fails. Check for the equivalence of functionality between system A (visual interface) and system B (ATI) throughout the voting session. F => If system A provides any functionality that is absent in system B, then, for requirement "Equivalent Functionality of ATI", the system fails. In contest #5, attempt to cause the ATI to repeat the candidates' names for Lt-Governor. F => If the system cannot be made to provide this repetition, then, for requirement "ATI Supports Repetition", the system fails. In contest #9, attempt to cause the ATI to pause and then resume as it announces the name of the 2nd candidate for county commissioner, and again for the 4th candidate. F => If the system cannot be made to provide this pause and resume, then, for requirement "ATI Supports Pause and Resume", the system fails. In contest #11, as the first candidate for water commissioner is being announced, skip ahead immediately to the next contest, for city council. As the second candidate for city council is being announced, return to contest #11, and then return to the already-voted contest #10 for Court of Appeals Judge. Finally, as contest #10 is being re-announced, skip ahead to contest #11. F => If these operations cannot be performed, then, for requirement "ATI Supports Transition to Next or Previous Contest", the system fails. As Referendum #2: PROPOSED CONSTITUTIONAL AMENDMENT D is being read, skip the reading of the full text of the amendment and go directly to choice of voting yes or no. F => If the system cannot be made to skip immediately to the voting choice, then, for requirement "ATI Can Skip Referendum Wording", the system fails. Test Method: Audio Volume Covers requirements: 3.3.3-C.4 Initial Volume 3.3.3-C.5 Range of Volume The tester shall initiate the voting session using the ATI and the default ballot choices. Do not adjust the sound volume, so as to accept the default volume provided by the system. The volume produced during the announcement of candidates in contest #1 (President and Vice-President) shall be measured (see below) as the initial volume. PF => If the initial volume is measured to be between 40 and 50 dB SPL, then, for requirement "Initial Volume", the system passes, otherwise it fails. Next, the tester shall adjust the volume to the minimum allowed and measure the announcement of candidates in contest #2 (US Senate) as the minimum volume. F => If the minimum volume is not approximately 20dB SPL (± 10%), then, for requirement "Range of Volume", the system fails. The tester shall then increase the volume gradually, up to the maximum allowed and measure the announcement of successive candidates in the contests presented. If the volume control has discrete increments, the tester shall increase the volume by one increment for each step. If the volume control has a continuous adjustment, the tester shall attempt to increase the volume by an amount no greater than 10dB SPL for each step. The ATI's "pause and resume" feature may be useful in performing these steps. F => If the measured difference in volume between any two successive steps is greater than 10dB SPL, then, for requirement "Range of Volume", the system fails. F => If the final (maximum) volume is not approximately 100dB SPL (± 10%), then, for requirement "Range of Volume", the system fails. Measuring Sound Volume Volume is measured in one of two ways, depending on whether the audio information is presented through open air or through headphones or a handset. For both modes: Since speech is used as the test signal, it shall be continuous for the entire measurement period (at least 15 seconds) and averaged over that period. The measuring equipment shall have an accuracy of at least ± 0.5 dB-SPL, an A-weighting filter, and a range from 15 dB SPL to 120 dB SPL. Open Air Volume General setup and test methods are as described in IEEE 1329 . The volume is measured as the dB SPL level of the audio information with a sound meter at the conventional head position(s) of a voter operating the voting system. Open air sound levels are to be measured in anechoic conditions to prevent reflections from affecting the measurement accuracy. If the voting system is designed for operation when both sitting and standing, then measurements shall be taken for both operating positions. Headphone/Handset Volume The test is described in IEEE 269. The referenced standard specifies the required test equipment, test setup, and test procedures. Follow the test methodology that is relevant to receiving audio through the through a private audio output device applicable to the VSUT. "Headphones" equates to the term "headsets" used in Clause 9 of the referenced standard. For a HATS (Head and Torso Simulator), Type 3.3 ears shall be used as defined in Clause 5 of the referenced standard. If the ERP (Ear Reference Point) is not specified by the manufacturer of the private audio output device, then the defaults in the referenced standard shall be used. Test Method: Non-Manual Operation Covers requirements: 3.3.4-B Support for Non-Manual Input 3.3.4-C Ballot Submission and Vote Verification A tester with accessibility expertise shall proceed through the editable ballot session, using the visual/non-manual interface of the system. Do not at any time make use of your hands to operate the system. As you proceed through the voting session, verify the following. The tester shall assess the basic usability of the mechanism provided for non-manual operation of the system. This includes such operations as selecting candidates, changing a vote, writing in a candidate, and navigating the ballot. It is not required that this operation be "just as easy" as manual operation, but it should be reasonably accessible. If the system provides several such mechanisms, they must each be evaluated. F => If the mechanism for non-manual operation of the system causes significant difficulty, then, for requirement "Support for non-manual input", the system fails. The tester must also check that the same functions are supported for non-manual use as for manual. In particular, the functions exercised by the editable ballot session, such as changing votes, navigating back and forth, and writing in a candidate must all be supported. PF => If non-manual use of the system is functionally equivalent to manual use, then, for requirement "Support for non-manual input", the system passes, otherwise it fails. If the voting station supports ballot verification and/or submission for non-disabled voters, the tester shall proceed through these processes using the features provided for with voters with dexterity disabilities and shall verify that these features constitute a viable mechanism even for voters who have no use of their hands. FP => If the mechanism for non-manual ballot verification and/or submission causes significant difficulty, then, for requirement "Ballot Submission and Verification", the system fails, otherwise it passes. End Usability and Accessibility Test Methods