Real solutions from the statistical community for differentially private and high-quality data releases by national statistical institutes.
Team members: Sigurd Hermansen, Natalie Shlomo, Tom Krenzke, Jane Li, Marcelo Simas
Westat offers innovative research and services to help clients improve outcomes in health, education, social policy, and transportation. We apply our expertise in representative sampling designs, data collection and management, statistical research, evaluation, communications, and technical assistance to meet a broad range of clients' requirements. Westat is an industry leader of harnessing techniques to preserve data quality, privacy, and confidentiality. Westat is an employee-owned company and is dedicated to improving lives through research.
Sigurd Hermansen is a Westat project IT manager who designs and programs large research databases, data modeling, and the development of mainframe, UNIX/Linux, and PC/Windows application systems. He has a master's degree in Economics from George Mason University and graduate coursework in Biomedical Statistics and Computer Science, National Institutes of Health, as well as in Computer Science, Virginia Polytechnic Institute. Sigurd has developed automated methods of statistical and predictive modeling, estimation, time series analysis, and forecasting and has managed automated processing of survey and laboratory data.
Dr. Natalie Shlomo is Professor of Social Statistics at the School of Social Sciences, University of Manchester. She is a survey statistician with interests in survey design and estimation, record linkage, statistical disclosure control, statistical data editing and imputation and small area estimation. Dr. Shlomo is an elected member of the International Statistical Institute and currently serving as Vice President. She is also a fellow of the Royal Statistical Society and the International Association of Survey Statisticians. Dr. Shlomo is on the editorial board of several journals, including the Journal of the International Association of Official Statistics and the International Statistical Review. She is a member of several national and international methodology advisory boards.
Tom Krenzke, a senior statistician and Associate Director of Westat's Statistical Staff, has more than 25 years of experience in survey sampling and estimation techniques. He is currently vice-chair of the American Statistical Association's Privacy and Confidentiality Committee, president of the Washington Statistical Society (ASA's largest chapter), and a member of Westat's Institutional Review Board. With an M.S. in Applied Statistics from Bowling Green State University, Tom has led research in privacy-preserving methods, sample design, variance estimation, nonresponse bias, response bias when incentives are used, and variance estimation when imputed values are present. His focus on statistical capabilities is in developing software for privacy-preserving methods, nonresponse bias analysis, generalized regression estimation, imputation, and more.
Dr. Jane Li is a Westat senior statistician with 15 years of experience in all aspects of survey research. With a Ph.D. from the joint program in survey methodology at the University of Maryland, Jane's latest work includes sample design, nonresponse adjustment, imputation, variance estimation, and data confidentiality and disclosure protection.
Dr. Marcelo Simas is a Westat senior study director with more than 15 years of experience leading software development, including design, development, implementation, and operations of online GPS data processing and GIS analytics systems. He designs and develops architecture for comprehensive travel survey platforms that feature seamless integration of all aspects of data collection and management.
A package of new forms of methods involving differential privacy (DP) is proposed that protects microdata files containing numerical, geospatial, and categorical data from linkage and inferential attacks by intruders. The team proposes a realistic approach for adding DP into our standard statistical disclosure control approaches for generating synthetic data. The approach includes research into generating synthetic microdata with additive DP noise in the estimating equations for multiple imputation sequential regressing modelling. The team's proposed solution addresses geospatial variables by implementing CART methods, and a remote analysis server as an output perturbation approach. Unique to the proposal is the application to real survey data with survey weights, for which application of DP is an ongoing research in the field. All proposed approaches, randomized mechanisms, and additive noise have the aim of balancing privacy and utility for a range of statistical analyses. Given that a DP mechanism is not secret, the parameters of the noise addition/ random mechanism can be made public, and researchers are able to account for the perturbation in their statistical inferences.
The pragmatic solution comes from the statistical community and mitigates the impact of privacy constraints on more widely used statistics and data products of national statistical institutes, in particular, and data providers in general. A privacy mechanism generates public microdata and tables with numeric precision and categorizations sufficient to serve the purpose of the data release and to prevent violations of a privacy guarantee.
Simulations and systematic testing will make the privacy mechanism more robust to attacks. The design of the mechanism and the statistical/machine learning expertise of the developers support previewing of potential intruder attacks and selecting combinations of methods required to counter them.
Back to The Unlinkable Data Challenge