The PETs Testbed provides the capability to investigate privacy-enhancing technologies (PETs) and their respective suitability for specific use cases.
The Testbed consists of example problems drawn from real-world use cases. Each problem is accompanied by a model solution that demonstrates a PETs implementation, documents the privacy risks of the implementation, and provides fidelity, utility, and privacy metrology to integrate the implementation. The Testbed provides fully implemented PETs architectures connected to data and hardware resources for official collaborators. The Testbed also maintains a set of repositories that allow the public to reproduce solutions offline to ‘play at home.’
In collaboration with the U.S. Census Bureau XD Team and offered through the National Cybersecurity Center of Excellence, NIST is developing a privacy-preserving federated learning environment as a model system to conduct cyber and privacy risk analysis.
Human genomic data is being collected and stored in data silos such as the Gene Expression Omnibus (GEO) and the NCI Genomic Data Commons (GDC). Analyzing genomic data across disparate silos can lead to improved research outcomes. For example, working across silos may be necessary to study rare cancers. Yet, aggregating information from multiple silos requires overcoming regulatory, data, security, and privacy issues. Privacy-preserving federated learning (PPFL) enables machine learning algorithms to train across silos while limiting personally identifiable information flows by adding private noise to the model weights (e.g., differential privacy) and encrypting computations (e.g., multi-party computation). PPFL is a promising architecture to balance machine learning data needs with privacy concerns. This project seeks to build PPFL model systems, frameworks, and assessment tools to help organizations evaluate the risks and benefits of PPFL systems.
Hosted by the National Cybersecurity Center of Excellence, this component investigates the trade offs of a distributed learning system deployed across silos of phenotype and genotype data.
The Testbed simulates several nodes (data silos) each of which contains phenotype and genotype data. Each node contains unique individuals. The task is to create a final output model that use single nucleotide polymorphism (SNP) data to predict physical traits. Initial data resources are from plant and other non-human genetic data to limit actual privacy risks to people. Privacy-preserving elements, such as differential privacy and multiparty computation, will be built into the distributed learning routine to limit information that can be used to conduct reconstruction, membership inference, and other privacy attacks.
PPFL output models will be evaluated using a set of fidelity and utility metrics. Privacy risks will be evaluated using a set of empirical privacy evaluation metrics and further tested through a series of red-teaming exercises.
The Testbed will develop a set of model PPFL architectures. These model architectures will be fully implemented in a cloud environment. Each architecture will be subjected to a privacy threat evaluation that will involve the use of tools such as fidelity and utility metrics, the NIST Privacy Framework, a suite of empirical privacy metrics, and the outcomes of a privacy red-teaming exercise. These tools will serve as a framework for helping organizations navigate the tradeoffs in a PPFL system.
By the end of 2024, NIST expects to have a fully functional demonstration PPFL architecture and a suite of tools to evaluate model utility and privacy risks. We will be inviting collaborators to develop PPFL algorithms and architectures, contribute to our evaluation metrics, and participate in our red-teaming exercises.
Send an email to ppfl [at] list.nist.gov (ppfl[at]list[dot]nist[dot]gov) to receive project updates.