HFP Teleconference - September 8, 2006

HFP Subcommittee Teleconference
Friday, September 8, 2006 at 2:00 p.m. ET

Agenda:

0. TGDC and EAC updates from Allan, John Wack
1. Status of usability testing research, Sharon
2. Discussion of VVSG version 2 draft:
      - general structure
      - Beginning of document through 3.2.2.3 vendor testing
            (See: http://vote.nist.gov/TGDC/hfpsections-090106.pdf)
3. Other progress and items, Sharon

Participants: Allan Eustis, John Cugini, Philip Pearce, David Baquis, Alexis Scott-Morrison, Whitney Quesenbery, John Gale

Administrative Updates:

John Wack on leave
AE: A number of us will be participating in MD and DC primary elections as judges. We've done some outreach visiting and observing election in WY and we're going to Seattle on the 19th to see their primaries to do the L&As before as well as the post election canvassing events to learn about other states procedures.
AE: We are continuing to communicate with the EAC to speed up process with getting Philip and Tricia on board as well as nominees from ANSI and National Association of State Election Directors.

Status Update:

All pre-testing approvals for usability research in place. Bill Killam is starting usability testing. The contract with Jenny Redish has been approved. We've identified issues that will slow down our process if not done and we've identified machines they can use. On the machines we have, Allan has identified the ballot language which can be changed.
We have a draft human subject's study and paperwork reduction act questionnaire package for submission. Sharon will review draft this weekend.
Whitney: Is there a target date for the first preliminary report (that Bill is doing research on)? A couple of weeks to run the testing - initial results within 4 to 6 weeks. It would be nice to have something to start the discussion process by the first of December.
Sharon: Contacted by Commission on Law and Aging from the American Bar Association in DC. They're running a symposium on "Facilitating Voting as People Age - Implication of Cognizant Impairment" in March. While they're looking at a lot of legal and policy issues in this workshop, they needed some input about the technology so Sharon's 15 minute overview was useful for them. They are commissioning some papers on different policy issues and Sharon sent her paper with Jenny. Sharon will be participating in technological aspect.

Review of Report (John Cugini):

Whitney: General Comments. Most of the time the order was quite good within sections. When it doesn't affect the flow in other ways, put the "shalls" in front of the "shoulds". If reading this for the first time, would you get a good idea of what was expected, and my answer was yes. Read for ambiguity and simplicity. Looks like there are more comments than there actually are.
John: A little bit of a structuring issue. Now there are 3 subsections. Sharon suggests keeping annotations on paper in case the EAC wants to know why it was structured this way. 3.1 is pure prose, no hard requirements, just explaining what we mean by usability and accessibility and general principles.
Whitney: The paragraph about familiarity - what we hear is all about making sure people know when change happens. John put it in because it is not something we do every day. We might want to reword it to say "to gain deeper expertise." Maybe wording such as "It needs to be self-teaching and walk you through the process."
Whitney: Stumbling over the language regarding EVID. She doesn't have a good suggestion to reword. It's in regards to the terms used - electronics and editable. Alexis has the same issues. She is working on changing it, because it's not intuitive. Perhaps the use of "manual" versus "editable".
John: Section 3.3. Usability requirements and accessible voting stations and how they interact. Do we need to say more or less? Works for Whitney and Alexis.
John: 3.2 is general usability requirements as opposed to accessibility so they apply to general machines. This is general overview. Carried over from VVSG 05, quoting HAVA about requiring us to do this. Sharon says that people are reading the HAVA quote as the first requirement. We don't want people to think it's one of our requirements, something has to be done to point this out - a clarifying note or a figure. Our requirements are voluntary.

Whitney: Should we put a statement in there that says "All of our requirements are essentially an attempt to provide detailed requirements to meet the mandate of HAVA." EAC will weigh in on our text. We can't contradict HAVA.
John: Performance Requirements. Paragraph explaining that we're applying general usability principles to voting, defined as usability satisfaction. Not much sense in word- smithing it at this point.

Whitney: Getting into the performance requirements themselves. Maybe we shouldn't look at this until we get Killam's research. Maybe some things, but anything with a rating of XXX, we should hold off on. Question about the level of abstraction with which the requirement refers to the test. Another way of doing this is saying that it must meet the NIST test protocol, whatever it is, or we could put more of the testing information in here. Are you comfortable with the level of abstraction?

Whitney: There's a huge debate on whether we should be looking at these tasks at all, and discussion about how much we can break it down. Maybe it should say you need to submit a report. Question about how much should be in the requirement? Are these the flavor of what we're looking for? Whitney thinks we're not going to get statistical validity on individual task analysis, we're looking for an overall assessment across the board. The detailed tasks are so we can make sure that the test participants have actually exercised the system and that the overall performance of the system meets some baseline. This stuff should not go in the requirements. We should weigh things so more frequent tasks are weighted heavier in the final score. The vendor may want a more detailed report. Breaking it down to a lower level, not that we wouldn't get the data, but that's not really the question, but over the population, how well on average will they do. We might want to break out some very specific things like write-ins or straight-party voting. This would be for the jurisdiction to see if they had. The public report is not where this should go.

Is there a passing score in every sub-test? Each section must receive a certain score, with the total equaling combined scores. Can you be really bad at one test, and make it up by doing really good in another? NIST should be able to take the data out of the pilot test and run it a couple different ways. Are we in a situation where we're all over the map and a slight change in the metric would push the possibility of passing one way or the other? Are we in that situation? Or are there machines that are clearly good and those that are not? One of the questions we must ask ourselves is "are we trying to say this is good enough? OR are we trying to set a gold standard that machines should aspire?" Conformance testing sets a low threshold. We're going to set the norm on where we are. Having a usability test will make sure we don't have any horrible systems, and give the machines that are trying to be better have something to strive for.
John Gale: In judging the usability performance & thinking of write-ins I'm not sure what percentage of write-ins you can average from state to state but it's a low percentage, and it seems that usability should rank as high for the 99% of people that have committed to a candidate.

Whitney: In NJ write-ins are a critical factor. They chose there systems to make sure write-ins were correct. We need to look at the metrics and something used infrequently and make sure that having a low score on that can't tip the balance. Evaluation of specific tasks, whether or not it gets reported as pass or fail, but it will get reported.
Whitney: Security. The paper that John circulated on Security (Software Independence) was quite interesting. It would be good for HFP to read. (Attached Below)

AE: Ron Rivest is going to forward to TGDC for comments.
John: Vendor Testing Section. Substantively unchanged. Vendors must conduct usability tests and report them, we are not too specific on what those tests are. We should reference the CIF and its ISO version. We do need more specific for the vendors and what the need to include. Whitney likes the idea of customizing. What do you mean by general population? How much should they report? Should this be an addendum? It would be a template oriented toward the voting application. The vendor should say how they recruit people. We want to ensure consistency. Susan Roth will be looking at this per Sharon L.
When should we send this to TGDC for vetting before December meeting? First weeks of October? Maybe we should start looking at other subcommittees' work. At least get some highlights from other subcommittees for HFP to review for general comments.

Next meeting September 29, 2006, 11:00 a.m.

Taxonomy of Voting System Records Production Approaches
Prepared for the STS Telecon
September 7, 2006

This is a brief, high-level paper on voting system approaches for the purposes of ballot records auditing. It presents an approach to categorizing these approaches in the VVSG 2007 using the class structure. It is meant for the purposes of discussion only.

We group different approaches to voting system design into two broad categories: software-independent and software-dependent approaches. Software-dependent approaches are best exemplified by today's DRE systems: the accuracy of the captured votes depends to a large extent on the accuracy of the software used to record the votes. DREs do not produce other records that can be used to positively verify the accuracy of the captured votes.

Software-independent approaches, on the other hand, produce voting records in such a way that their accuracy can be verified even if the voting system software contains errors or deliberate fraud. Such approaches should be, in theory, less expensive to test than software-dependent approaches. While VVPAT is one example of this approach, some end-end cryptographic approaches are also software-independent. The category Independent Dual Verification (IDV) consists of a variety of different voting system approaches, including current VVPAT and Op Scan (combined with Electronic Ballot Marking devices), and the more theoretical Witness approaches.

While some of these designs, e.g., VVPAT, are purely software-independent, other designs such as Witness are somewhat software-dependent. This bears more explanation, as follows:

In VVPAT, for example, the voter's indirect verification of the DRE's electronic record is backed up by the voter's direct verification of the paper record. Furthermore, the paper record cannot be changed by the voting system after the voter has verified it, thus it can be used in useful comparisons with the electronic record(s). Of course, some software is still involved and paper can be mishandled at later stages, so further security measures are still required. But, the two records can be compared for accuracy and errors/fraud in
the voting system software can be detected.

In the Witness design, (Witness is a theoretical approach that no vendors admit to pursuing but it is useful for illustrative purposes) a camera takes a picture of the DRE's summary screen immediately after a voter finalizes his or her ballot and the voter does
not verify that the picture was taken. One can imagine that the voting system must somehow signal that the summary screen is being displayed so that the photo can be taken - or some timing protocol must be in effect so that the events are synchronized. For this to occur, software (or software resident in hardware) must be trusted to work correctly. Even if a voter is able to monitor the recording of the photo, software is involved.

Thus, one indirect verification takes place - if the camera displays the photo it has taken, two indirect verifications are possible. But, the camera-related software involved is hopefully relatively small and thus more easily verified for correctness than, say, the DRE itself. Two or more records are produced, and the DRE's electronic records can be compared against the digital photos and verified.

This approach would be preferred over the pure DRE approach. Consequently, some software-dependent approaches are preferred over others. More testing of these approaches is warranted, with some sort of a sliding scale going from IDV approaches (less testing) to DRE approaches (more testing).

The high-level taxonomy of software-independent and -dependent approaches, then, would be as follows:

1. Software-Independent Approaches
      a. End-End Cryptographic Voting Protocols
      b. IDV
            i. Paper Based Systems
                  1. VVPAT
                  2. EBM/Op Scan
                  3. MMPB/Op Scan (?)

2. Software-Dependent Approaches
      a. IDV
            i. Witness approaches (e.g., VoteGuard, http://www.democracysystems.com/)
            ii. Other schemes using 2 indirect verifications
      b. DRE

Teleconferences from 2004, 2005, 2006 and upcoming in 2006.

****************

Link to NIST HAVA Page

Last updated: July 25, 2007
Point of Contact

Privacy policy / security notice / accessibility statement
Disclaimer / FOIA
NIST is an agency of the U.S. Commerce Department