On the hardness of learning sparse parities
Abstract
This work investigates the hardness of computing sparse solutions to systems of linear equations over F2. Consider the k-EVENSET problem: given a homogeneous system of linear equations over F2 on n variables, decide if there exists a nonzero solution of Hamming weight at most k (i.e. a k-sparse solution). While there is a simple O(nk/2)-time algorithm for it, establishing fixed parameter intractability for k-EVENSET has been a notorious open problem. Towards this goal, we show that unless k-CLIQUE can be solved in no(k) time, k-EVENSET has no polynomial time algorithm when k = ω(log2 n). Our work also shows that the non-homogeneous generalization of the problem - which we call k-VECTORSUM - is W[1]-hard on instances where the number of equations is O(k log n), improving on previous reductions which produced Ω(n) equations. We use the hardness of k-VECTORSUM as a starting point to prove the result for k-EVENSET, and additionally strengthen the former to show the hardness of approximately learning k-juntas. In particular, we prove that given a system of O(exp(O(k)) · logn) linear equations, it is W[1]-hard to decide if there is a k-sparse linear form satisfying all the equations or any function on at most k-variables (a k-junta) satisfies at most (1/2 + ϵ)-fraction of the equations, for any constant ϵ > 0. In the setting of computational learning, this shows hardness of approximate non-proper learning of k-parities. In a similar vein, we use the hardness of k-EVENSET to show that that for any constant d, unless k-CLIQUE can be solved in no(k) time, there is no poly(m, n) · 2o(√k) time algorithm to decide whether a given set of m points in F2n satisfies: (i) there exists a non-trivial k-sparse homogeneous linear form evaluating to 0 on all the points, or (ii) any non-trivial degree d polynomial P supported on at most k variables evaluates to zero on ≈ PrF2n[P(z) = 0] fraction of the points i.e., P is fooled by the set of points. Lastly, we study the approximation in the sparsity of the solution. Let the GAP-k-VECTORSUM problem be: given an instance of k-VECTORSUM of size n, decide if there exist a k-sparse solution, or every solution is of sparsity at least k′ = (1 + δ0)k. Assuming the Exponential Time Hypothesis, we show that for some constants c0, δ0 ≥ 0 there is no poly(n) time algorithm for GAP-k-VECTORSUM when k = ω((loglogn)co).