The NIST Privacy Engineering Program has launched the Collaborative Research Cycle (CRC) to spur research, innovation, and understanding of data deidentification techniques. The CRC is a revolutionary rethinking of the typical data challenge format, using metrics and benchmark data not to competitively rank approaches but to drive collaborative research towards better understanding them.
Tabular demographic data (e.g., surveys) are essential to administering government programs (e.g., representation apportionment, welfare spending, infrastructure planning, etc.). These data are invaluable for policy making and research purposes. Government agencies with mandates to collect this type of data are typically required to both publish the data and also maintain the confidentiality of the records. Deidentification techniques such as those using synthetic data and differential privacy, can help organizations navigate the tension between publishing data and protecting the privacy of individuals.
The Collaborative Research Cycle (CRC) investigates the effect of deidentification algorithms on the fidelity, utility, and privacy of privatized data. In this problem, participants use any approach to reduce privacy risks to the NIST Diverse Community Data Excerpts, a subset of the American Community Survey. Participants submit their privatized data and an abstract on how the data were generated. We use the SDNist software package to evaluate the data. The private data, the abstract, and their evaluation results are then archived in human- and machine-readable repository built for metaanalysis of the deidentification techniques. We are currently working on incorporating a much more expensive set of empirical privacy evaluation tools into SDNist to better evaluate privacy risks of the private data.
The CRC has an archive of nearly 500 deidentified data instances (and we’re still accepting more!), each accompanied by a detailed abstract on the generation methods, and detailed evaluation data. The program also has tools to make it easy to parse and navigate the data. The CRC has hosted a series of workshops to gather the community to investigate deidentification.
Send an email to CRC+subscribe [at] list.NIST.gov (CRC+subscribe[at]list[dot]NIST[dot]gov) to join the CRC listserv for future updates
For collaborations, contact the program principal investigator, gary.howarth [at] nist.gov (gary[dot]howarth[at]nist[dot]gov)