Privately Releasing Conjunctions and the Statistical Query Barrier
Anupam Gupta, Moritz Hardt, Aaron Roth, Jonathan Ullman
[arXiv]
Suppose we would like to know all answers to a set of statistical
queries C on a data set up to small error, but we can only access the
data itself using statistical queries. A trivial solution is to
exhaustively ask all queries in C: Can we do any better?
- We show that the number of statistical queries necessary and
sufficient for this task is—up to polynomial factors—equal to the
agnostic learning complexity of C in Kearns’ statistical query (SQ)
model. This gives a complete answer to the question when running time
is not a concern.
- We then show that the problem can be solved efficiently (allowing
arbitrary error on a small fraction of queries) whenever the answers to
C can be described by a submodular function. This includes many natural
concept classes, such as graph cuts and Boolean disjunctions and
conjunctions.
While interesting from a learning theoretic point of view, our main
applications are in privacy preserving data analysis: Here, our second
result leads to the first algorithm that efficiently releases
differentially private answers to all Boolean conjunctions with 1%
average error. This presents significant progress on a key open problem
in privacy-preserving data analysis. Our first result on the other hand
gives unconditional lower bounds on any differentially private algorithm
that admits a (potentially non-privacy-preserving) implementation using
only statistical queries. Not only our algorithms, but also most known
private algorithms can be implemented using only statistical queries,
and hence are constrained by these lower bounds. Our result therefore
isolates the complexity of agnostic learning in the SQ-model as a new
barrier in the design of differentially private algorithms.