The growing influence of data is accompanied by an increased risk of exploitation when information falls into the wrong hands. Weaknesses in the security of the original data threaten the privacy of individuals. Even a dataset that redacts individual identities is vulnerable to misuse. NIST PSCR set out to protect personally identifiable information while maintaining a dataset's utility for analysis, thus protecting individual privacy while allowing for public safety data to be used by researchers for positive purposes and outcomes.
PSCR has hosted a series of prize challenge competitions that focus on advancing data privacy while protecting personally identifiable information. In one, participants used a dataset of emergency response events occurring in San Francisco and a sub-sample of the IPUMS USA data for the 1940 U.S. Census. Since then, challenge contestants have used common analytics tasks such as clustering, classification, and regression of a synthetic (or artificial) dataset to serve as a practical replacement for the original sensitive data.
Contestants proved mathematically that a synthetic data generator satisfies the rigorous Differential Privacy guarantee, and that we can be confident the synthetic data it produces won't contain any information that can be traced back to specific individuals in the original data. NIST PSCR produced five open source algorithm solutions that may be used for continued research. Next, PSCR will incentivize the development of a privacy-preserving dashboard map that shows changes across different temporal map data segments over time.
The ability for public safety to use datasets without comprising personally identifiable information may make it possible to identify emergency trends more quickly, which could aid first responders and dispatch centers as they face an influx of 911 calls. These outcomes, which provide significant implications for continued differential privacy research and data privacy, also aid in the ability to save lives.