Beating the direct sum theorem in communication complexity with implications for sketching
Abstract
A direct sum theorem for two parties and a function f states that the communication cost of solving k copies of f simultaneously with error probability 1/3 is at least k · R1/3(f), where R 1/3(f) is the communication required to solve a single copy of f with error probability 1/3. We improve this for a natural family of functions f, showing that the 1-way communication required to solve k copies of f simultaneously with probability 2/3 is Ω(k · R1/k(f)). Since R1/k(f) may be as large as Ω(R1/3(f) · log k), we asymptotically beat the direct sum bound for such functions, showing that the trivial upper bound of solving each of the k copies of f with probability 1 - O(1/k) and taking a union bound is optimal! In order to achieve this, our direct sum involves a novel measure of information cost which allows a protocol to abort with constant probability, and otherwise must be correct with very high probability. Moreover, for the functions considered, we show strong lower bounds on the communication cost of protocols with these relaxed guarantees; indeed, our lower bounds match those for protocols that are not allowed to abort. In the distributed and streaming models, where one wants to be correct not only on a single query, but simultaneously on a sequence of n queries, we obtain optimal lower bounds on the communication or space complexity. Lower bounds obtained from our direct sum result show that a number of techniques in the sketching literature are optimal, including the following: • (JL transform) Lower bound of Ω(1/ε2 log n/δ) on the dimension of (oblivious) Johnson-Lindenstrauss transforms. • (ℓp-estimation) Lower bound for the size of encodings of n vectors in [±M]d that allow ℓ1 or ℓ2-estimation of Ω(nε-2 log n/δ (log d + log M)). • (Matrix sketching) Lower bound of Ω(1/ε 2 log n/δ) on the dimension of a matrix sketch S satisfying the entrywise guarantee |(ASST B)i,j - (AB)i,j | ≤ ε∥Ai∥2∥Bj∥ 2. • (Database joins) Lower bound of Ω(n 1/ε2 log n/δ log M) for sketching frequency vectors of n tables in a database, each with M records, in order to allow join size estimation. Copyright © SIAM.